Week of October 23rd

Part III of a Cloud️ Journey

“I’m learning to fly ✈️, around the clouds ☁️

Hi All –

Happy National Mole Day! 👨‍🔬👩🏾‍🔬

“Open your eyes 👀, look up to the skies🌤 and see 👓

As we have all learned, Cloud computing ☁️ empowers us all to focus our time 🕰 on dreaming 😴 up and creating the next great scalable ⚖️ applications. In addition, Cloud computing☁️ enables us less worry😨 time 🕰 about infrastructure, managing and maintaining deployment environments or agonizing😰 over security🔒. Google evangelizes these principals stronger💪 than any other company in the world 🌎.

Google’s strategy for cloud computing☁️ is differentiated by providing open source runtime systems and a high-quality developer experience where organizations could easily move workloads from one cloud☁️ provider to another.

Once again, this past week we continued our exploration with GCP by finishing the last two courses as part of Google Cloud Certified Associate Cloud Engineer Path on Pluralsight. Our ultimate goal was to have a better understanding of the various GCP services and features and be able to apply this knowledge, to better analyze requirements and evaluate numerous options available in GCP. Fortunately, we gained this knowledge and a whole lot more! 😊

Guiding us through another great introduction on Elastic Google Cloud Infrastructure: Scaling and Automation were well-known friends Phillip Maier and Mylene Biddle. Then taking us through the rest of the way through this amazing course was the always passionate Priyanka Vergadia.

Then finally taking us down the home stretch 🏇 with Architecting with Google Kubernetes Engine – Foundations (which was last of this amazing series of Google Goodness 😊) were famous Googlers Evan Jones and Brice Rice.  …And Just to put the finishing touches 👐 on this magical 🎩💫 mystery tour was Eoin Carrol who gave us in depth look at Google’s game changer in Modernizing existing applications and building cloud-native☁️ apps anywhere with Anthos.

After a familiar introduction by Philip and Mylene we began delving into the comprehensive and flexible 🧘‍♀️ infrastructure and platform services provided by GCP.

“Across the clouds☁️ I see my shadow fly✈️… Out of the corner of my watering💦 eye👁

Interconnecting Networks – There are 5 ways of connecting your infrastructure to GCP:

  1. Cloud VPN
  2. Dedicated interconnect
  3. Partner interconnect
  4. Direct peering
  5. Carrier peering

Cloud VPN – securely connects your on-premises network to your GCP VPC network. In order to connect to your on-premise network via Cloud VPN configure cloud VPN, VPN Gateway, and to repeat in tunnels.

  • Useful for low-volume connections
  • 99.9% SLA
  • Supports:
    • Site-to-site VPN
    • Static routes
    • Dynamic routes (Cloud Router)
    • IKEv1 and IKEv2 ciphers

Please note: The maximum transmission unit or MTU for your on-premises VPN gateway cannot be greater than 1460 bytes.

$gcloud compute --project "qwiklabs-GCP -02-9474b560327d" target-vpn-gateways create "vpn-1" --region "us-central1" --network "vpn-network-1"

$gcloud compute --project "qwiklabs-GCP -02-9474b560327d" target-vpn-gateways create "vpn-1" --region "us-central1" --network "vpn-network-1"
$gcloud compute --project "qwiklabs-GCP -02-9474b560327d" forwarding-rules create "vpn-1-rule-esp" --region "us-central1" --address "35.192.18.39" --IP -protocol "ESP" --target-vpn-gateway "vpn-1"

$gcloud compute --project "qwiklabs-GCP -02-9474b560327d" forwarding-rules create "vpn-1-rule-udp500" --region "us-central1" --address "35.192.18.39" --IP -protocol "UDP" --ports "500" --target-vpn-gateway "vpn-1"

$gcloud compute --project "qwiklabs-GCP -02-9474b560327d" forwarding-rules create "vpn-1-rule-udp4500" --region "us-central1" --address "35.192.18.39" --IP -protocol "UDP" --ports "4500" --target-vpn-gateway "vpn-1"

$gcloud compute --project "qwiklabs-GCP -02-9474b560327d" vpn-tunnels create "tunnel1to2" --region "us-central1" --ike-version "2" --target-vpn-gateway "vpn-1"

$gcloud compute --project "qwiklabs-GCP -02-9474b560327d" vpn-tunnels create "vpn-1-tunnel-2" --region "us-central1" --peer-address "34.78.144.99" --shared-secret "GCP  rocks" --ike-version "2" --local-traffic-selector "0.0.0.0/0" --target-vpn-gateway "vpn-1"

Cloud Interconnect and Peering

Dedicated connections provide a direct connection to Google’s network while shared connections provide a connection to Google’s network through a partner

Comparison of Interconnect Options

  • IPsec VPN Tunnel – Encrypted tunnel to VPC networks through the public internet
    • Capacity 1.5-3 Gbps/Tunnel
    • Requirements VPN Gateway
    • Access Type – Internal IP Addresses
  • Cloud Interconnect – Dedicated interconnect provides direct physical connections
    • Capacity 10 Gbps/Link -100 Gbps/ link
    • Requirements – connection in colocation facility
    • Access Type– Internal IP Addresses
  • Partner Interconnect– provides connectivity through a supported service provider
    • 50 Mbps -10 Gbps/connection
    • Requirements – Service provider
    • Access Type– Internal IP Addresses

Peering

Direct Peering provides a direct connection between your business network and Google.

  • Broad-reaching edge network locations
  • Capacity 10 Gbps/link
  • Exchange BGP routes
  • Reach all of Google’s services
  • Peering requirement (Connection in GCP PoPs)
  • Access Type: Public IP Addresses

Carrier Peering provides connectivity through a supported partner

  • Carrier Peering partner
  • Capacity varies based on parent offering
  • Reach all of Google’s services
  • Partner requirements
  • No SLA
  • Access Type: Public IP Addresses

Choosing the right connection – Decision Tree 🌳

Shared VPC and VPC Peering

Shared VPC allows an organization to connect Resource is from multiple projects to a common VPC network.

VPC Peering is a decentralized or distributed approach to multi project networking because each VPC network may remain under the control of separate administrator groups and maintains its own global firewall and routing tables.

Load Balancing 🏋️‍♂️ and Autoscaling

Cloud Load Balancing 🏋️‍♂️ – distributes user traffic 🚦across multiple instances of your applications. By spreading the load, load balancing 🏋️‍♂️ reduces the risk that your applications experience performance issues. There are 2 basic categories of Load balancers:  Global load balancing 🏋️‍♂️ and Regional load balancing 🏋️‍♂️.

Global load balancers – when workloads are distributed across the world 🌎 Global load balancers route traffic🚦 to a backend service in the region closest to the user, to reduce latency.

They are software defined distributed systems using Google Front End (GFE) reside in Google’s PoPs are distributed globally

Types of Global Load Balancers

  • External HTTP and HTTPS (Layer 7)
  • SSL Proxy (Layer 4)
  • TCP Proxy (Layer 4)

Regional load balancers – when all workloads are in the same region

  • Regional load balancing 🏋️‍♂️ route traffic🚦within a given region.
  • Regional Load balancing 🏋️‍♂️ uses internal and network load balancers.

Internal load balancers are software defined distributed systems (using Andromeda) and network load balancers which use  Maglev distributed system.

Types of Global Load Balancers

  • Internal TCP/UDP (Layer 4)
  • Internal HTTP and HTTPS (Layer 7)
  • TCP/UDP Network (Layer 4)

Managed instance groups – is a collection off identical virtual machine instances that you control as a single entity. (Same as creating a VM but applying specific rules to an Instance group)

  • Deploy identical instances based on instance template
  • Instance group can be resized
  • Manager ensures all instances are Running
  • Typically used with autoscaling ⚖️
  • Can be single zone or regional

Regional managed instance groups are usually recommended over zonal managed instance groups because this allow you to spread the application’s load across multiple zones through replication and protect against zonal failures.

         Steps to create a Managed Instance Group:

  1. Need to decide the location and whether instance group will be Single or multi-zones
  2. Choose ports that are going allow load balancing🏋️‍♂️ across.
  3. Select Instance template
  4. Decide Autoscaling ⚖️ and criteria for use
  5. Creatine a health ⛑ check to determine instance health and how traffic🚦should route

Autoscaling and health checks

Managed instance groups offer autoscaling ⚖️ capabilities

  • Dynamically add/remove instances
    • Increases in load
    • Decrease in load
  • Autoscaling policy
    • CPU Utilization
    • Load balancing 🏋️‍♂️capacity
    • Monitoring 🎛 metrics
    • Queue-based workload

HTTP/HTTPs load balancing

  • Target HTTP/HTTPS proxy
  • One singed SSL certificate installed (minimum)
  • Client SSL session terminates at the load balancer 🏋️‍♂️
  • Support the QUIC transport layer protocol
  • Global load balancing🏋️‍♂️
  • Anycast IP address
  • HTTP or port 80 or 8080
  • HTTPs on port 443
  • IPv4 or IP6
  • Autoscaling⚖️
  • URL maps 🗺

Backend Services

  • Health ⛑ check
  • Session affinity (Optional)
  • Time out setting (30-sec default)
  • One or more backends
    • An instance group (managed or unmanaged)
    • A balancing mode (CPU utilization or RPS)
    • A capacity scaler ⚖️ (ceiling % of CPU/Rate targets)

SSL certificates

  • Required for HTTP/HTTPS load balancing 🏋️‍♂️
  • Up to 10 SSL certificates /target proxy
  • Create an SSL certificate resource 

SSL proxy load balancing – global 🌎 load balancing service for encrypted non-http traffic.

  • Global 🌎 load balancing for encrypted non-HTTP traffic 🚦
  • Terminate SSL session at load balancing🏋️‍♂️ Layer
  • IPv4 or IPv6 clients
  • Benefits:
    • Intelligent routing
    • Certificate management
    • Security🔒 patching
    • SSL policies

TCP proxy load balancing – a global load balancing service for unencrypted non http traffic.

  • Global load balancing for encrypted non-HTTP traffic 🚦
  • Terminates TCP session at load balancing🏋️‍♂️ Layer
  • IPv4 or IPv6 clients
  • Benefits:
    • Intelligent routing
    • Security🔒 patching

Network load balancing – is a regional non-proxy load balancing service.

  • Regional, non-proxied load balancer
  • Forwarding rules (IP protocol data)
  • Traffic:
    • UDP
    • TCP/SSL ports
  • Backends:
    • Instance group
    • Target pool🏊‍♂️ resources – define a group of instances that receive incoming traffic 🚦from forwarding rules:
      • Forwarding rules (TCP and UDP)
      • Up to 50 per project
      • One health check
      • Instances must be in the same region

Internal load balancing – is a regional private load balancing service for TCP and UDP based traffic🚦

  • Regional, private load balancing 🏋️‍♂️
    • VM instances in same region
    • RFC 1918 IP address
  • TCP/UDP traffic 🚦
    • Reduced latency, simpler configuration
    • Software-defined, fully distributed load balancing (is not based on a device or a virtual machine.)

“..clouds☁️ roll by reeling is what they say … or is it just my way?”

Infrastructure Automation – Infrastructure as code (IaC)

Automate repeatable tasks like provisioning, configuration, and deployments for one machine or millions.

Deployment Manager -is an infrastructure deployment service that automates the creation and management of GCP. By defining templates, you only have to specify the resources once and then you can reuse them whenever you want.

  • Deployment Manager is an infrastructure automation tool🛠
  • Declarative Language (allows you to specify what the configuration should be and let the system figure out the steps to take)
  • Focus on the application
  • Parallel deployment
  • Template-driven

Deployment manager creates all the resources in parallel.

Additional infrastructure as code tools🛠 for GCP

  • Terraform 🌱
  • Chef 👨‍🍳
  • Puppet
  • Ansible
  • Packer 📦

It’s recommended that you provisioned and managed resource is on GCP with the tools🛠 you are already familiar with

GCP Marketplace 🛒 – lets you quickly deploy functional software packages that run 🏃‍♂️ on GCP.

  • Deploy production-grade solutions
  • Single bill for GCP and third-party services
  • Manage solutions using Deployment Manager
  • Notification 📩 when a security update is available
  • Direct access to partner support

“… Must be the clouds☁️☁️ in my eyes 👀

Managed Services – automates common activities, such as change requests, monitoring 🎛, patch management, security🔒, and backup services, and provides full lifecycle 🔄 services to provision, run🏃‍♂️, and support your infrastructure.

BigQuery is GCP serverless, highly scalable⚖️, and cost-effective cloud Data warehouse

  • Fully managed
  • Petabyte scale
  • SQL interface
  • Very fast🏃‍♂️
  • Free usage tier

Cloud Dataflow🚰 executes a wide variety of data processing patterns

  • Server-less, fully managed data processing
  • Batch and stream processing with autoscale ⚖️
  • Open source programming using Apache beam 🎇
  • Intelligently scale to millions of QPS

Cloud Dataprep – visually explore, clean, and prepare data for analysis and machine learning

  • Serverless, works at any scale⚖️
  • Suggest ideal analysis
  • Integrated partner service operated by Trifacta

Cloud Dataproc – is a service for running Apache Spark 💥 and Apache Hadoop 🐘 clusters

  • Low cost (per-second, preemptible)
  • Super-fast🏃‍♂️ to start, scale ⚖️, and shut down
  • Integrated with GCP
  • Managed Service
  • Simple and familiar

“Captain Jack will get you by tonight 🌃… Just a little push, and you’ll be smilin’😊 “

Architecting with Google Kubernetes Engine – Foundations

After Steller🌟 job by was Priyanka taking us through the load balancing 🏋️‍♂️ options, infrastructure as code and some of the managed service options in GCP it was time to take the helm⛵️ and get our K8s ☸️ hat 🧢 on.

Cloud Computing and Google Cloud

Just to wet our appetite 😋 for Cloud Computing ☁️and Evan takes through 5 fundamental attributes:

  1. On-demand self-services (No🚫 human intervention needed to get resources)
  2. Broad network access (Access from anywhere)
  3. Resource Pooling🏊‍♂️ (Provider shares resources to customers)
  4. Rapid Elasticity 🧘‍♀️ (Get more resources quickly as needed)
  5. Measured 📏Service (Pay only for what you consume)

Next, Evan introduced some of GCP Services under Compute like Compute Engine, Google Kubernetes Engine (GKE), App Engine, Cloud Functions. He then discussed some Google’s managed services with Storage, Big Data, Machine Learning Services.

Resource Management

Network

  • GCP provides resource is in multi-regions, regions and zones.
  • GCP divides the world 🌎 up into 3 multi-regional areas the Americas, Europe and Asia Pacific.
  • 3 multi regional areas are divided into regions which are independent geographic areas on the same continent.
  • Regions are divided into zones (like a data center) which are deployment areas for GCP resources.

The network interconnect with the public internet at more than 90 internet exchanges and more than 100 points of presence worldwide🌎 (and growing)

  • Zonal resources operate exclusively in a single zone
  • Regional resources span multiple zones
  • Global resources could be managed across multiple regions.
  • Resources have hierarchy
    • Organization is the root node of a GCP resource hierarchy
    • Folders reflect their hierarchy of your enterprise
    • Projects are identified by unique Project ID and Project Number
  • Cloud Identity Access Management (IAM) allows you to fine tune access controls on all resources in GCP.

Billing

  • Billing account pay for project resources
  • A billing account in linked to one or more projects
  • Charged automatically or invoiced every month or at threshold limit
  • Subaccounts can be used for separate billing for projects

How to keep billing under control

  1. Budgets and alerts 🔔
  2. Billing export 🧾
  3. Reports 📊
    • Quotas are helpful limits
      • Quotas apply at the level of GCP Project
      • There two types of quotas
        • Rate quotas reset after a specific time 🕰
        • Allocation quotas govern the number of resources in projects

GCP implements quotas which limit unforeseen extra billing charges. Quotas error designed to prevent the over consumption of resources because of an error or a malicious attack 👿.

Interacting with GCP -There are 4 ways to interact with GCP:

  1. Cloud Console
  • Web-based GUI to manage all Google Cloud resources
  • Executes common task using simple mouse clicks
  • Provides visibility into GCP projects and resources

2. SDK

3. Cloud Shell

  • gcloud
  • kubectl
  • gsutil
  • bq
  • bt
  • Temporary Compute Engine VM
  • Command-line access to the instance through a browser
  • 5 GB of persistent disk storage ($HOME dir)
  • Preinstalled Cloud SDK and other tools🛠
    • gcloud: for working with Compute Engine, Google Kubernetes Engine (GKE) and many Google Cloud services
    • gsutil: for working with Cloud Storage
    • kubectl: for working with GKE and Kubernetes
    • bq: for working with BigQuery
  • Language support for Java ☕️Go, Python 🐍 , Node.js, PHP, and Ruby♦️
  • Web 🕸preview functionality
  • Built-in authorization for access to resources and instances

4. Console mobile App

“Cloud☁️ hands🤲 reaching from a rainbow🌈 tapping at the window, touch your hair”

Introduction to Containers

Next Evan took us through a history of computing. First starting with deploying applications on its physical servers. This solution wasted resources and took a lot of time to deploy maintaining scale. It also wasn’t very portable. It all applications were built for a specific operating system, and sometimes even for specific hardware as well.

Next transitioning to Virtualization. Virtualization made it possible to run multiple virtual servers and operating systems on the same physical computer. A hypervisor is the software layer that removes the dependencies of an operating system with its underlying hardware. It allows several virtual machines to share that same hardware.

  • Hypervisors creates and manages virtual machines
  • Running multiple apps on a single VM
  • VM-centric way to solve this problem (run a dedicated virtual machine for each application.)

Finally, Evan introduced us to containers as they solve a lot of the short comings of Virtualization like:

  • Applications that share dependencies are not isolated from each other
  • Resource requirements from one application can starve out another application
  • A dependency upgrade for one application might cause another to simply stop working

Containers are isolated user spaces for running application code. Containers are lightweight as they don’t carry a full operating system. They could be scheduled or packed 📦tightly onto the underlying system, which makes them very efficient.

Containerization is the next step in the evolution of managing code.

Benefits of Containers:

  • Containers appeal to developers 👩🏽‍💻
  • Deliver high performing and scalable ⚖️ applications.
  • Containers run the same anywhere
  • Containers make it easier to build applications that use Microservices design pattern
    • Microservices
      • Highly maintainable and testable.
      • Loosely coupled. Independently deployable. Organized around business capabilities.

Containers and Container Images

An Image is an application and its dependencies

A container is simply a running 🏃‍♂️ instance image.

Docker 🐳 is an open source technology that allows you to create and run 🏃‍♂️ applications and containers, but it doesn’t offer away to orchestrate those applications at scale ⚖️

  • Containers use a varied set of Linux technologies
    • Containers use Linux name spaces to control what an application can see
    • Containers used Linux C groups to control what an application can use.
    • Containers use union file systems to efficiently encapsulate applications and their dependencies into a set of clean, minimal layers.
  • Containers are structured in layers
    • Container manifest is tool🛠 you used to build the image reads instructions from a file
    • Docker 🐳 file is a formatted container image
      • Each instruction in the Docker file specifies a layer (Read Only) inside the container image.
      • Readable ephemeral top layer
  • Containers promote smaller shared images

How to get containers?

  • Download containerized software from a container registry gcr.io
  • Docker 🐳 – Build your own container using the open-source docker 🐳 command
  • Build your own container using Cloud Build
    • Cloud Build is a service that executes your builds on GCP.
    • Cloud Build can import source code from
      • Google Cloud Storage
      • Cloud Source Repositories
      • GitHub,
      • Bitbucket

Introduction to Kubernetes ☸️

Kubernetes ☸️ is an open source platform that helps you orchestrate and manage your container infrastructure on premises or in the cloud☁️.

It’s a container centric management environment. Google originated it and then donated it to the open source community.

K8s ☸️ automates the deployment, scaling ⚖️, load balancing🏋️‍♂️, logging, monitoring 🎛 and other management features of containerized applications.

  • Facilitates the features of an infrastructure as a service
  • Supports Declarative configurations.
  • Allows Imperative configuration
  • Open Source

K8s ☸️ features:

  • Supports both stateful and stateless applications
  • Autoscaling⚖️
  • Resource limits
  • Extensibility

Kubernetes also supports workload portability across on premises or multiple cloud service providers. This allows Kubernetes to be deployed anywhere. You could move Kubernetes ☸️ workloads freely without vendor lock🔒 in

Google Kubernetes Engine (GKE)

GKE easily deploys, manages and scales⚖️ Kubernetes environments for your containerized applications on GCP.

GKE Features:

  • Fully managed
    • Cluster represent your containerized applications all run on top of a cluster in GKE.
    • Nodes are the virtual machines that host your containers inside of a GKE app cluster
  • Container-optimized OS
  • Auto upgrade
  • Auto repair🛠
  • Cluster Scaling⚖️
  • Seamless Integration
  • Identity and access management (IAM)
  • Integrated logging and monitoring (Stack Driver)
  • Integrated networking
  • Cloud Console

Compute Options Detail

 Computer Engine

  • Fully customizable virtual machines
  • Persistent disks and optional local SSDs
  • Global load balancing and autoscaling⚖️
  • Per-second billing

                  Use Cases

  • Complete control over the OS and virtual hardware
    • Well Suited for lift-and shift migrations to the cloud
    • Most flexible 🧘‍♀️ compute solution, often used when a managed solution is too restrictive

  App Engine

  • Provides a fully managed code-first platform
  • Streamlines application deployment and scalability⚖️
  • Provides support for popular programming language and application runtimes
  • Supports integrated monitoring 🎛, logging and diagnostics.

Use Cases

  • Websites
    • Mobile app📱 and gaming backends
    • Restful APIs

Google Kubernetes Engine

  • Fully managed Kubernetes Platform
  • Supports cluster scaling ⚖️, persistent disk, automated upgrades, and auto node repairs
  • Built-in integration with GCP
  • Portability across multiple environments
    • Hybrid computing
    • Multi-cloud computing

Use Cases

  • Containerized applications
    • Cloud-native distributed systems
    • Hybrid applications

Cloud Run

  • Enable stateless containers
  • Abstracts away infrastructure management
  • Automatically scales ⚖️ up⬆️ and down⬇️
  • Open API and runtime environment

Use Cases

  • Deploy stateless containers that listen for requests or events
    • Build applications in any language using any frameworks and tool🛠

Cloud Functions

  • Event-driven, serverless compute services
  • Automatic scaling with highly available and fault-tolerant design
  • Charges apply only when your code runs 🏃‍♂️
  • Triggered based on events in GCP, HTTP endpoints, and Firebase

Use Cases

  • Supporting microservice architecture
    • Serverless application backends
      • Mobile and IoT backends
      • Integrate with third-party services and APIs
    • Intelligent applications
      • Virtual assistant and chat bots
      • Video and image analysis

Kubernetes ☸️ Architecture

There are two related concepts in understanding K8s ☸️ works object model and principle of declarative management

Pods – the basic building block of K8s ☸️

  • Smallest deployable object.
  • Containers in a Pod share resources
  • Pods are not self-healing

Principle of declarative management – Declare some objects to represent those in generic containers.

  • K8s ☸️ creates and maintains one or more objects.
  • K8s ☸️ compares the desired state to the current state.

The Kubernetes ☸️Control Plane ✈️continuously monitor the state of the cluster, endlessly comparing reality to what has been declared and remedying the state has needed.

K8s ☸️ Cluster consists of a Master and Nodes

Master is to coordinate the entire cluster.

  • View or change the state of the cluster including launching pods.
  • kube-API server – the single component that interacts with the Cluster
    • kubectl server interacts with the database on behalf of the rest of the system
  • etcd – key-value store for the most critical data of a distributed system
  • kube-scheduler – assigns Pods to Nodes
  • kube-cloud-manager – embeds cloud-specific control logic.
  • Kube-controller-manager- daemon that embeds the core control loops

Nodes runs run pods.

  • kubelet is the primary “node agent” that runs on each node.
  • kube-proxy is a network proxy that runs on each node in your cluster

Google Kubernetes ☸️ Engine Concepts

GKE makes administration of K8s ☸️ much simpler

  • Master
    • GKE manages all the control plane components
    • GKE provisions and manages all the master infrastructure
  • Nodes
    • GKE manages this by deploying and registering Compute Engine instances as Nodes
    • Use node pools to manage different kinds of nodes
      • node pool (using nodemon) is a subset of nodes within a cluster that share a configuration, such as their amount of memory or their CPU generation.
      • nodemon is GKE specific feature
        • enable automatic node upgrades
        • automatic node repairs 🛠
        • cluster auto scaling ⚖️

Zonal Cluster – has a single control plane in a single zone.

  • single-zone cluster has a single control plane running in one zone
  • multi-zonal cluster has a single replica of the control plane running in a single zone, and has nodes running in multiple zones.

Regional Cluster – has multiple replicas of the control plane, running in multiple zones within a given region.

Private Cluster – provides the ability to isolate nodes from having inbound and outbound connectivity to the public internet.

Kubernetes ☸️ Object Management – identified by a unique name and a unique identifier.

  • Objects are defined in a YAML file
  • Objects are identified by a name
  • Objects are assigned a unique identifier (UID) by K8s ☸️
  • Labels 🏷 are key value pairs that tag your objects during or after their creation.
  • Labels 🏷 help you identify and organize objects and subsets of objects.
  • Labels 🏷 can be matched by label selectors

Pods and Controller Objects

Pods have a life cycle🔄

  • Controller Object types
    • Deployment – ensure that sets of Pods are running
      • To perform the upgrade, the Deployment object will create a second ReplicaSet object, and then increase the number of (upgraded) Pods in the second ReplicaSet while it decreases the number in the first ReplicaSet
    • StatefulSet
    • DaemonSet
    • Job
  • Allocating resource quotas
  • Namespaces – provide scope for naming resources (pods, deployments and controllers.)

There are 3 initializer spaces in the cluster.

  1. Default name space for objects with no other name space defined.
  2. Kube-system named Space for objects created by the Kubernetes system itself.
  3. Kube-Public name space for objects that are publicly readable to all users.

Best practice tip: namespace neutral YAML

  • Apply name spaces at the command line level which makes YAML files more flexible🧘‍♀️.

Advanced K8s ☸️ Objects

Services

  • set of Pods and assigns a policy by which you can access those pods
    • Services provide load-balanced 🏋️‍♂️ access to specified Pods. There are three primary types of Services:
      • ClusterIP: Exposes the service on an IP address that is only accessible from within this cluster. This is the default type.
      • NodePort: Exposes the service on the IP address of each node in the cluster, at a specific port number.
      • LoadBalancer 🏋️‍♂️: Exposes the service externally, using a load balancing 🏋️‍♂️ service provided by a cloud☁️ provider.

Volume

  • A directory that is accessible to all containers in a Pod
    • Requirements of the volume can be specified using Pod specification
    • You must mount these volumes specifically on each container within a Pod
    • Set up Volumes using external storage outside of your Pods to provide durable storage

Controller Objects

  • ReplicaSets – ensures that a population of Pods
    • Deployments
      • Provides declarative updates to ReplicaSets and Pods
      • Create, update, roll back, and scale⚖️ Pods, using ReplicaSets
      • Replication Controllers – perform a similar role to the combination of ReplicaSets and Deployments, but their use is no longer recommended.
    • StatefulSets – similar to a Deployment, Pods use the same container specs
    • DaemonSets – ensures that a Pod is running on all or some subset of the nodes.
    • Jobs – creates one or more Pods required to run a task

“Can I get an encore; do you want more?”

Migrate for Anthos – tool🛠 for getting workloads into containerized deployments on GCP

  • Automated process that moves your existing applications into a K8s ☸️ environment.

Migrate for Anthos moves VMs to containers

  • Move and convert workloads into containers
  • Workloads can start as physical servers or VMs
  • Moves workload compute to container immediately (<10 min)
  • Data can be migrated all at once or “streamed” to the cloud☁️ until the app is live in the cloud☁️

Migrate for Anthos Architecture

  • A migration requires an architecture to be built
  • A migration is a multi-step process
  • Configure processing cluster
  • Add migration source
  • Generate and review plan
  • Generate artifacts
  • Test
  • Deploy

Migrate for Anthos Installation -requires a processing cluster

         Installing Migrate for Anthos uses migctl

$migctl setup install

         Adding a source enables migration from a specific environment

$migctl source create cd my-ce-src –project my-project –zone zone

         Creating a migration generates a migration plan

$migctl migration create test-migration –source my-ce-src –vm- id my-id –intent image

         Executing a migration generates resource and artifacts

$migctl migration generate-artifacts my-migration

         Deployment files typically need modification

$migctl migration get-artifacts test-migration

Apply the configuration to deploy the workload

$Kubectl apply -f deployment_sepc.yaml

“And we’ll bask 🌞 in the shadow of yesterday’s triumph🏆 And sail⛵️ on the steel breeze🌬

Below are some of the destinations I am considering for my travels for next week:

Thanks –

–MCS

Week of October 9th

Part I of a Cloud☁️ Journey

“On a cloud☁️ of sound 🎶, I drift in the night🌙”

Hi All –

Ok Happy Hump 🐫 Day!

The last few weeks we spent some quality time ⏰ visiting with Microsoft SQL Server 2019. A few weeks back, we kicked 🦿the tires 🚗 with IQP and the improvements made to TempDB. Then week after we were fully immersed with Microsoft’s most ambitious offering in SQL Server 2019 with Big Data Clusters (BDC).

This week we make our triumphant return back to the cloud☁️. If you have been following our travels this past summer☀️ we focused on the core concepts of AWS and then we concentrated on the fundamentals of Microsoft Azure. So, the obvious natural progression of our continuous cloud☁️ journey✈️ would be to further explore the Google Cloud Platform or more affectionately known as GCP. We had spent a considerable amount time 🕰 before learning many of the exciting offerings in GCP but our long awaited return has been long overdue. Besides we felt the need to gives some love ❤️ and oxytocin 🤗 to our friends from Mountain View

“It starts with one☝️ thing…I don’t know why?”

Actually, Google has 10 things when it comes to their philosophy but more on that later. 😊

One of Google’s strong 💪 beliefs is that in “in the future every company will be a data company because making the fastest and best use of data is a critical source of a competitive advantage.”

GCP is Google’s Cloud Computing☁️ solution that provides a wide variety of services such as Compute, Storage🗄, Big data, and Machine Learning for managing and getting value from data and doing that at infinite scale⚖️. GCP offers over 90 products and Services.

“If you know your history… Then you would know where you coming from”

In 2006, AWS began offering cloud computing☁️ to the masses, several years later Microsoft Azure followed suit and shortly right after GCP joined the Flexible🧘‍♀️, Agile, Elastic, Highly Available and scalable⚖️ party 🥳. Although, Google was a late arrival to the cloud computing☁️ shindig🎉 their approach and strategy to Cloud☁️ Computing is far from a Johnny-come-lately” 🤠

“Google Infrastructure for Everyone” 😀

Google does not view cloud computing☁️ as a “commodity” cloud☁️. Google’s methodology to cloud computing☁️ is of a “premier💎 cloud☁️”, one that provides the same innovative, high-quality, deluxe set of services, and rich development environment with the advanced hardware that Google has been running🏃‍♂️ internally for years but made available for everyone through GCP.

“No vendor lockin 🔒. Guaranteed. 👍

In addition, Google who is certainly no stranger to Open Source software promotes a vision🕶 of the “open cloud☁️”. A cloud☁️ environment where companies🏢🏭 large and small 🏠can seamlessly move workloads from one cloud☁️ provider to another. Google wants customers to have the ability to run 🏃‍♂️their applications anywhere not just in Google.

“Get outta my dreams😴… Get into my car 🚙

Now that I extolled the virtues of Google’s vision 🕶 and strategy for Cloud computing☁️, it’s time to take this car 🚙 out for a spin. Fortunately, the team at Google Cloud☁️ have put together one of the best compilations since the Zeppelin Box Set 🎸in there Google Cloud Certified Associate Cloud Engineer Path on Pluralsight.   

Since there is some much to unpack📦, we will need to break our learnings down into multiple parts. So to help us put our best foot 🦶forward through the first part our journey ✈️ will be Googler Brice Rice and former Googler Catherine Gamboa through their Google Cloud Platform Fundamentals – Core Infrastructure course.

In a great introduction, Brian expounds on the definition of Cloud Computing☁️ and a brief history on Google’s transformation from the virtualization model to a container‑based architecture, an automated, elastic, third‑wave cloud☁️ built from automated services.

Next, Brian reviews GCP computing architectures:

Infrastructure as a Service (IaaS) – provide raw compute, storage🗄, and network organized in ways that are familiar from data centers. You pay for what you allocate

Platform as a Service (PaaS) – binds application code you write to libraries📚 that give access to the infrastructure your application needs. You pay 💰 for what you use.

Software as a Service (SaaS) – applications in that they’re consumed directly over the internet by end users. Popular examples: Search 🔎, Gmail 📧, Docs 📄, and Drive 💽

Then we had an overview of Google’s network which according to some estimates carries as much as 40% of the world’s 🌎 internet traffic🚦. The network interconnects at more than 90 internet exchanges and more than 100 points of presence worldwide 🌎 (and growing). One of the benefits of GCP is that it leverages Google’s robust network. Allowing GCP resources to be hosted in multiple locations worldwide 🌎. At granular level these locations are organized by regions and zones. A region is a specific geographical 🌎 location where you can host your resources. Each region has one or more zones (most regions have three or more zones).

All of the zones within a region have fast⚡️network connectivity among them. A zone is like as a single failure domain within a region. A best practice in building a fault‑tolerant application, is to deploy resources across multiple zones in a given region to protect against unexpected failures.

Next, we had summary on Google’s Multi-layered approach to security🔒.

Highlights:

  • Server boards and the networking equipment in Google data centers are custom‑designed by Google.
  • Google also designs custom chips, including a hardware security🔒 chip (Titan) deployed on both servers and peripherals.
  • Google Server machines use cryptographic signatures✍️ to make sure they are booting the correct software.
  • Google designs and builds its own data centers (eco friendly), which incorporate multiple layers of physical security🔒 protections. (Access to these data centers is limited to only a few authorized Google Employees)
  • Google’s infrastructure provides cryptographic🔐 privacy🤫 and integrity for remote procedure‑called data on the network, which is how Google Services communicate with each other.
  • Google has multitier, multilayer denial‑of‑service protections that further reduces the risk of any denial‑of‑service 🛡 impact.

Rounding out the introduction was a sneak peek 👀 into the Budgets and Billing 💰. Google offers customer-friendly 😊 pricing with a per‑second billing for its IaaS compute offering, Fine‑grained billing is a big cost‑savings for workloads that are bursting. GCP provides four tools 🛠to help with billing:

  • Budgets and alerts 🔔
  • Billing export🧾
  • Reports 📊
  • Quotas

Budgets can be a fixed limit, or you can tie it to another metric, for example a percentage of the previous month’s spend.

Alerts 🔔 are generally set at 50%, 90%, and 100%, but are customizable

Billing export🧾 store detailed billing information in places where it’s easy to retrieve for more detailed analysis

Reports📊 is a visual tool in the GCP console that allows you to monitor your expenditure. GCP also implements quotas, which protect both account owners and the GCP community as a whole 😊.

Quotas are designed to prevent the overconsumption of resources, whether because of error or malicious attack 👿. There are two types of quotas, rate quotas and allocation quotas. Both get applied at the level of the GCP project.

After a great intro, next Catherine kick starts🦵 us with GCP. She begins with a discussion around resource hierarchy 👑 and trust🤝 boundaries.

Projects are the main way you organize the resources (all resources belong to a project) you use in GCP. Projects are used to group together related resources, usually because they have a common business objective. A project consists of a set of users, a set of APIs, billing, quotas, authentication, and monitoring 🎛 settings for those APIs. Projects have 3 identifying attributes:

  1. Project ID (Globally 🌎 unique)
  2. Project Name
  3. Project Number (Globally 🌎 Unique)

Projects may be organized into folders 🗂. Folders🗂 can contain other folders 🗂. All the folders 🗂 and projects used by an organization can be put in organization nodes.

Please Note: If you use folders 🗂, you need to have an organization node at the top of the hierarchy👑.

Projects, folders🗂, and organization nodes are all places where the policies can be defined.

A policy is a set on a resource. Each policy contains a set of roles and members👥.

  • A role is a collection of permissions. Permissions determine what operations are allowed on a resource. There are three kinds of roles (primitive):
  1. Owner
  2. Editor
  3. Viewer

Another role made available in IAM is to control the billing for a project without the right to change the resources in the project is billing administrator role.

Please note IAM provides finer‑grained types of roles for a project that contains sensitive data, where primitive roles are too generic.

A service account is a special type of Google account intended to represent a non-human user ⚙️ that needs to authenticate and be authorized to access data in Google APIs.

  • A member(s)👥 can be a Google Account(s), a service account, a Google group, or a Google Workspace or Cloud☁️ Identity domain that can access a resource.

Resources inherit policies from the parent.

Identity and Access Management (IAM) allows administrators to manage who (i.e. Google account, a Google group, a service account, or an entire Work Space) can do what (role) on specific resources There are four ways to interact with IAM and the other GCP management layers:

  1. Web‑based console 🕸
  2. SDK and Cloud shell (CLI)
  3. APIs
    1. Cloud Client Libraries 📚
    1. Google API Client Library 📚
  4. Mobile app 📱

When it comes to entitlements “The principle of least privilege” should be followed. This principle says that each user should have only those privileges needed to do their jobs. In a least privilege environment, people are protected from an entire class of errors.  GCP customers use IAM to implement least privilege, and it makes everybody happier 😊.

For example, you can designate an organization policy administrator so that only people with privilege can change policies. You can also assign a project creator role, which control who can spend money 💵.

Finally, we checked into Marketplace 🛒 which provides an easy way to launch common software packages in GCP. Many common web 🕸 frameworks, databases🛢, CMSs, and CRMs are supported. Some Marketplace 🛒 images charge usage fees, like third parties with commercially licensed software. But they all show estimates of their monthly charges before you launch them.

Please Note: GCP updates the base images for these software packages to fix critical issues 🪲and vulnerabilities, but it doesn’t update the software after it’s been deployed. However, you’ll have access to the deployed system so you can maintain them.

“Look at this stuff🤩 Isn’t it neat? Wouldn’t you think my collection’s complete 🤷‍♂️?

Now with basics of GCP covered, it was time 🕰 to explore 🧭 some the computing architectures made available within GCP.

Google Compute Engine

Virtual Private Cloud (VPC) – manage a networking functionality for your GCP resources. Unlike AWS (natively), GCP VPC is global 🌎 in scope. They can have subnets in any GCP region worldwide 🌎. And subnets can span the zones that make up a region.

  • Provides flexibility🧘‍♀️ to scale️ and control how workloads connect regionally and globally🌎
  • Access VPCs without needing to replicate connectivity or administrative policies in each region
  • Bring your own IP addresses to Google’s network infrastructure across all regions

Much like physical networks, VPCs have routing tables👮‍♂️and Firewall🔥 Rules which are built in.

  • Routing tables👮‍♂️ forward traffic🚦from one instance to another instance
  • Firewall🔥 allows you to restrict access to instances, both incoming(ingress) and outgoing (egress) traffic🚦.

Cloud DNS manages low latency and high availability of the DNS service running on the same infrastructure as Google

Cloud VPN securely connects peer network to Virtual Private Cloud (VPC) network through an IPsec VPN connection.

Cloud Router lets other networks and Google VPC exchange route information over the VPN using the Border Gateway Protocol.

VPC Network Peering enables you to connect VPC networks so that workloads in different VPC networks can communicate internally. Traffic🚦stays within Google’s network and doesn’t traverse the public internet.

  • Private connection between you and Google for your hybrid cloud☁️
  • Connection through the largest partner network of service providers

Dedicated Interconnect which allows direct private connections providing highest uptimes (99.99% SLA) for their interconnection with GCP

Google Compute Engine (IaaS) delivers Linux or Windows virtual machines (VMs) running in Google’s innovative data centers and worldwide fiber network. Compute Engine offers scale ⚖️, performance, and value that lets you easily launch large compute clusters on Google’s infrastructure. There are no upfront investments, and you can run thousands of virtual CPUs on a system that offers quick, consistent performance. VMs can be created via Web 🕸 console or the gcloud command line tool🔧.

For Compute Engine VMs there are two kinds of persistent storage🗄 options:

  • Standard
  • SSD

If your application needs high‑performance disk, you can attach a local SSD. ⚠️ Beware to store data of permanent value somewhere else because local SSD’s content doesn’t last past when the VM terminates.

Compute Engine offers innovative pricing:

  • Per second billing
  • Preemptible instances
  • High throughput to storage🗄 at no additional cost
  • Only pay for hardware you need.

At the time of this post, N2D standard and high-CPU machine types have up to 224 vCPUs and 128 GB of memory which seems like enough horsepower 🐎💥 but GCP keeps upping 🃏🃏 the ante 💶 on maximum instance type, vCPU, memory and persistent disk. 😃

Sample Syntax creating a VM:

$ gcloud compute zones list | grep us-central1

$ gcloud config set compute/zone us-central1-c
$ gcloud compute instances create “my-vm-2” –machine-type “n1-standard-1” –image-project “debian-cloud” –image “ebian-9-stretch-v20170918” –subnet “default”

Compute Engine also offers auto Scaling ⚖️ which adds and removes VMs from applications based on load metrics. In addition, Compute Engine VPCs offering load balancing 🏋️‍♀️ across VMs. VPC supports several different kinds of load balancing 🏋️‍♀️:

  • Layer 7 load balancing 🏋️‍♀️ based on load
  • Layer 4 load balancing 🏋️‍♀️ based on non-http SSL load
  • Layer 4 load balancing 🏋️‍♀️ based on non-http SSL traffic🚦
  • Any Traffic🚦 (TCP, UDP)
  • Traffic🚦inside a VPC

Cloud CDN –accelerates💥 content delivery 🚚 in your application allowing users to experience lower network latency. The origins of your content will experience reduced load, and cost savings. Once you’ve set up HTTPS load balancing 🏋️‍♀️, simply enable Cloud CDN with a single checkbox.

Next on our plate 🍽 was to investigate the storage🗄 options that are available in GCP

Cloud Storage 🗄 is fully managed, high durability, high availability, scalable ⚖️ service. Cloud Storage 🗄 can be used for lots of use cases like serving website content, storing data for archival and disaster recovery, or distributing large data objects.

Cloud Storage🗄 offers 4 different types of storage 🗄 classes:

  • Regional
  • Multi‑regional
  • Nearline 😰
  • Coldline 🥶

Cloud Storage🗄 is comprised of buckets 🗑 which create, and configure, and use to hold storage🗄 objects.

Buckets 🗑 are:

  • Globally 🌎 Unique
  • Different storage🗄 classes
  • Regional or multi-regional
  • Versioning enabled (Objects are immutable)
  • Lifecycle 🔄 management Rules

Cloud Storage🗄supports several ways to bring data into Cloud Storage🗄.

  • Use gsutil Cloud SDK.
  • Drag‑and‑drop in the GCP console (with Google Chrome browser).
  • Integrated with many of the GCP products and services:
    • Import and export tables from and to BigQuery and Cloud SQL
    • Store app engine logs
    • Cloud data store backups
    • Objects used by app engine
    • Compute Engine Images
  • Online storage🗄 transfer service (>TB) (HTTPS endpoint)
  • Offline transfer appliance (>PB) (rack-able, high capacity storage🗄 server that you lease from Google)

“Big wheels 𐃏 keep on turning”

Cloud Bigtable is afully managed, scalable⚖️ NoSQL database🛢 service for large analytical and operational workloads. The databases🛢 in Bigtable are sparsely populated tables that can scale to billions of rows and thousands of columns, allowing you to store petabytes of data. Data encryption inflight and at rest are automatic

GCP fully manages the surface, so you don’t have to configure and tune it. It’s ideal for data that has a single lookup keys🔑 and for storing large amounts of data with very low latency.

Cloud Bigtable is offered through the same open source API as HBase, which is the native database🛢 for the Apache Hadoop 🐘 project.

Cloud SQL is a fully managed relational database🛢 service for MySQL, PostgreSQL, and MS SQL Server which provides:

  • Automatic replication
  • Managed backups
  • Vertical scaling ⚖️ (Read and Write)
  • Horizontal Scaling ⚖️
  • Google integrated Security 🔒

Cloud Spanner is a fully managed relational database🛢 with unlimited scale⚖️ (horizontal), strong consistency & up to 99.999% high availability.

It offers transactional consistency at a global🌎 scale ⚖️, schemas, SQL, and automatic synchronous replication for high availability, and it can provide petabytes of capacity.

Cloud Datastore is a highly scalable ⚖️ (Horizontal) NoSQL database🛢 for your web 🕸 and mobile 📱 applications.

  • Designed for application backends
  • Supports transactions
  • Includes a free daily quota

Comparing Storage🗄 Options

Cloud Datastore is the best for semi‑structured application data that is used in App Engine applications.

Bigtable is best for analytical data with heavy read/write events like AdTech, Financial 🏦, or IoT📲 data.

Cloud Storage🗄 is best for structured and unstructured binary or object data, like images🖼, large media files🎞, and backups.

Cloud SQL is best for web 🕸 frameworks and existing applications, like storing user credentials and customer orders.

Cloud Spanner is best for large‑scale⚖️ database🛢 applications that are larger than 2 TB, for example, for financial trading and e‑commerce use cases.

“Everybody, listen to me… And return me my ship⛵️… I’m your captain👩🏾️, I’m your captain👩🏾‍✈️”

Containers, Kubernetes ☸️, and Kubernetes Engine

Containers provide independent scalable ⚖️ workloads, that you would get in a PaaS environment, and an abstraction layer of the operating system and hardware, like you get in an IaaS environment. Containers virtualize the operating system rather than the hardware. The environment scales⚖️ like PaaS but gives you nearly the same flexibility as Infrastructure as a Service

Kubernetes ️ is an open source orchestrator for containers. K8s ☸️ make it easy to orchestrate many containers on many hosts, scale ⚖️ them, roll out new versions of them, and even roll back to the old version if things go wrong 🙁. K8s ☸️ lets you deploy containers on a set of nodes called a cluster.

A cluster is set of master components that control the system as a whole, and a set of nodes that run containers.

K8s ☸️ deploys a container or a set of related containers, it does so inside an abstraction called a pod.

A pod is the smallest deployable unit in Kubernetes.

Kubectl starts a deployment with a container running in a pod. A deployment represents a group of replicas of the same pod. It keeps your pods running 🏃‍♂️, even if a node on which some of them run on fails.

Google Kubernetes Engine (GKE) ☸️ is Secured and managed Kubernetes service ️ with four-way auto scaling ⚖️ and multi-cluster support.

  • Leverage a high-availability control plane ✈️including multi-zonal and regional clusters
  • Eliminate operational overhead with auto-repair 🧰, auto-upgrade, and release channels
  • Secure🔐 by default, including vulnerability scanning of container images and data encryption
  • Integrated Cloud Monitoring 🎛 with infrastructure, application, and Kubernetes-specific ☸️ views

GKE is like an IaaS offering in that it saves you infrastructure chores and it’s like a PaaS offering in that it was built with the needs of developers 👩‍💻 in mind.

Sample Syntax building a K8 cluster:

gcloud container clusters create k1

In GKE to make the pods in your deployment publicly available, you can connect a load balancer🏋️‍♀️ to it by running the kubectl expose command. K8s ☸️ then creates a service with a fixed IP address for your pods.

A service is the fundamental way K8s ️ represents load balancing 🏋️‍♀️. A K8s ☸️ attaches an external load balancer🏋️‍♀️ with a public IP address to your service so that others outside the cluster can access it.

In GKE, this kind of load balancer🏋️‍♀️ is created as a network load balancer🏋️‍♀️. This is one of the managed load balancing 🏋️‍♀️ services that Compute Engine makes available to virtual machines. GKE makes it easy to use it with containers.

Service groups is a set of pods together and provides a stable end point for them

Imperative commands

kubectl get services shows you your service’s public IP address

kubectl scale – scales ⚖️ a deployment

kubectl expose – creates a service

kubectl get pods watch the pods come online

The real strength 💪of K8s ☸️ comes when you work in a declarative of way. Instead of issuing commands, you provide a configuration file (YAML) that tells K8s ☸️ what you want your desired state to look like, and Kubernetes ☸️ figures out how to do it.

When you choose a rolling update for a deployment and then give it a new version of the software it manages, Kubernetes will create pods of the new version one by one, waiting for each new version pod to become available before destroying one of the old version pods. Rolling updates are a quick way to push out a new version of your application while still sparing your users from experiencing downtime.

“Going where the wind 🌬 goes… Blooming like a red rose🌹

Introduction to Hybrid and Multi-Cloud Computing (Anthos)

Modern hybrid or multi‑cloud☁️ architectures allows you to keep parts of your system’s infrastructure on‑premises, while moving other parts to the cloud☁️, creating an environment that is uniquely suited to many company’s needs.

Modern distributed systems allow a more agile approach to managing your compute resources

  • Move only some of you compute workloads to the cloud ☁️
  • Move at your own pace
  • Take advantage of cloud’s☁️ scalability️ and lower costs 💰
  • Add specialized services to compute resources stack

Anthos is Google’s modern solution for hybrid and multi-cloud☁️ systems and services management.

The Anthos framework rests on K8s ☸️ and GKE deployed on‑prem, which provides the foundation for an architecture that is fully integrated with centralized management through a central control plane that supports policy‑based application life cycle🔄 delivery across hybrid and multi‑cloud☁️ environments.

Anthos also provides a rich set of tools🛠 for monitoring 🎛 and maintaining the consistency of your applications across all of your network, whether on‑premises, in the cloud☁️ K8s ☸️, or in multiple clouds☁️☁️.

Anthos Configuration Management provides a single source of truth for your cluster’s configuration. That source of truth is kept in the policy repository, which is actually a Git repository.

“And I discovered🕵️‍♀️ that my castles 🏰 stand…Upon pillars of salt🧂 and pillars of sand 🏖

App Engine (PaaS) builds a highly scalable ⚖️ application on a fully managed serverless platform.

App Engine makes deployment, maintenance, autoscaling ⚖️ workloads easy allowing developers 👨‍💻to focus on innovation

GCP provides an App Engine SDK in several languages so developers 👩‍💻 can test applications locally before uploaded to the real App Engine service.

App Engine’s standard environment provides runtimes for specific versions of Java☕️, Python🐍, PHP, and Go. The standard environment also enforces restrictions🚫 on your code by making it run in a so‑called sandbox. That’s a software construct that’s independent of the hardware, operating system, or physical location of the server it runs🏃‍♂️ on.

If these constraints don’t work for a given applications, that would be a reason to choose the flexible environment.

App Engine flexible environment:

  • Builds and deploys containerized apps with a click
  • No sandbox constraints
  • Can access App Engine resources

App Engine flexible environment apps use standard runtimes, can access App Engine services such as

  • Datastore
  • Meme cache
  • Task Queues

Cloud Endpoints – Develop, deploy, and manage APIs on any Google Cloud☁️ back end.

Cloud Endpoints helps you create and maintain APIs

  • Distributed API management through an API console
  • Expose your API using a RESTful interface

Apigee Edge is also a platform for developing and managing API proxies.

Apigee Edge focus on business problems like rate limiting, quotas, and analytics a

  • A platform for making APIs available to your customers and partners
  • Contains analytics, monetization, and a developer portal

Developing in the Cloud ☁️

Cloud Source Repositories – Fully featured Git repositories hosted on GCP

Cloud Functions – Scalable ⚖️ pay-as-you-go functions as a service (FaaS) to run your code with zero server management.

  • No servers to provision, manage, or upgrade
  • Automatically scale⚖️ based on the load
  • Integrated monitoring 🎛, logging, and debugging capability
  • Built-in security🔒 at role and per function level based on the principle of least privilege
  • Key🔑 networking capabilities for hybrid and multi-cloud☁️☁️ scenarios
  • Deployment: Infrastructure as code

Deployment: Infrastructure as code

Deployment Manager – creates and manages cloud☁️ resources with simple templates

  • Provides repeatable deployments
  • Create a .yaml template describing your environment and use Deployment Manager to create resources

“Follow my lead, oh, how I need… Someone to watch over me”

Monitoring 🎛: Proactive instrumentation

Stackdriver is GCP’s tool for monitoring 🎛, logging and diagnostics. Stackdriver provides access to many different kinds of signals from your infrastructure platforms, virtual machines, containers, middleware and application tier; logs, metrics and traces. It provides insight into your application’s health ⛑, performance and availability. So, if issues occur, you can fix them faster.

Here are the core components of Stackdriver;

  • Monitoring 🎛
  • Logging
  • Trace
  • Error Reporting
  • Debugging
  • Profiler

Stackdriver Monitoring 🎛 checks the end points of web 🕸 applications and other Internet‑accessible services running on your cloud☁️ environment.

Stackdriver Logging view logs from your applications and filter and search on them.

Stackdriver error reporting tracks and groups the errors in your cloud☁️ applications and it notifies you when new errors are detected.

Stackdriver Trace sample the latency of App Engine applications and report per URL statistics.

Stackdriver Debugger of connects your application’s production data to your source code so you can inspect the state of your application at any code location in production

“Whoa oh oh oh oh… Something big I feel it happening”

GCP Big Data Platform – services are fully managed and scalable ⚖️ and Serverless

Cloud Dataproc is a fast, easy, managed way to run🏃‍♂️ Hadoop 🐘 MapReduce 🗺, Spark 🔥, Pig 🐷 and Hive 🐝 Service

  • Create clusters in 90 seconds or less on average
  • Scale⚖️ cluster up and down even when jobs are running 🏃‍♂️
  • Easily migrate on-premises Hadoop 🐘 jobs to the cloud☁️
  • Uses Spark🔥 Machine Learning Libraries📚 (MLib) to run classification algorithms

Cloud Dataflow🚰 – Stream⛲️ and Batch processing; unified and simplified pipelines

  • Processes data using Compute Engine instances.
  • Clusters are sized for you
  • Automated scaling ⚖️, no instance provisioning required
  • Managed expressive data Pipelines
  • Write code once and get batch and streaming⛲️.
  • Transform-based programming model
  • ETL pipelines to move, filter, enrich, shape data
  • Data analysis: batch computation or continuous computation using streaming
  • Orchestration: create pipelines that coordinate services, including external services
  • Integrates with GCP services like Cloud Storage🗄, Cloud Pub/Sub, BigQuery and BigTable
  • Open source Java☕️ and Python 🐍 SDKs

BigQuery🔎 is a fully‑managed, petabyte scale⚖️, low‑cost analytics data warehouse

  • Analytics database🛢; stream data 100,000 rows /sec
  • Provides near real-time interactive analysis of massive datasets (hundreds of TBs) using SQL syntax (SQL 2011)
  • Compute and storage 🗄 are separated with a terabit network in between
  • Only pay for storage 🗄 and processing used
  • Automatic discount for long-term data storage 🗄

Cloud Pub/Sub – Scalable ⚖️, flexible🧘‍♀️ and reliable enterprise messaging 📨

Pub in Pub/Sub is short for publishers

Sub is short for subscribers.

  • Supports many-to-many asynchronous messaging📨
  • Application components make push/pull subscriptions to topics
  • Includes support for offline consumers
  • Simple, reliable, scalable ⚖️ foundation for stream analytics
  • Building block🧱 for data ingestion in Dataflow, IoT📲, Marketing Analytics
  • Foundation for Dataflow streaming⛲️
  • Push notifications for cloud-based☁️ applications
  • Connect applications across GCP (push/pull between Compute Engine and App Engine

Cloud Datalab🧪 is a powerful interactive tool created to explore, analyze, transform and visualize data and build machine learning models on GCP.

  • Interactive tool for large-scale⚖️ data exploration, transformation, analysis, and visualization
  • Integrated, open source
    • Built on Jupyter

“Domo arigato misuta Robotto” 🤖

Cloud Machine Learning Platform🤖

Cloud☁️ machine‑learning platform🤖 provides modern machine‑learning services🤖 with pre‑trained models and a platform to generate your own tailored models.

TensorFlow 🧮 is an open‑source software library 📚 that’s exceptionally well suited for machine‑learning applications🤖 like neural networks🧠.

TensorFlow 🧮 can also take advantage of Tensor 🧮 processing units (TPU), which are hardware devices designed to accelerate machine‑learning 🤖 workloads with TensorFlow 🧮. GCP makes them available in the cloud☁️ with Compute Engine virtual machines.

Generally, applications that use machine‑learning platform🤖 fall into two categories, depending on whether the data worked on is structured or unstructured.

For structured data, ML 🤖 can be used for various kinds of classification and regression tasks, like customer churn analysis, product diagnostics, and forecasting. In addition, Detection of anomalies like fraud detection, sensor diagnostics, or log metrics.

For unstructured data, ML 🤖 can be used for image analytics, such as identifying damaged shipment, identifying styles, and flagging🚩content. In addition, ML🤖 can be used for text analytics like a call 📞 center log analysis, language identification, topic classifications, and sentiment analysis.

Cloud Vision API 👓 derives insights from your images in the cloud☁️ or at the edge with AutoML Vision👓 or use pre-trained Vision API👓 models to detect emotion, understand text, and more.

  • Analyze images with a simple REST API
  • Logo detection, label detection
  • Gain insights from images
  • Detect inappropriate content
  • Analyze sentiment
  • Extract text

Cloud Natural Language API 🗣extracts information about people, places, events, (and more) mentioned in text documents, news articles, or blog posts

  • Uses machine learning🤖 models to reveal structure and meaning of text
  • Extract information about items mentioned in text documents, news articles, and blof posts

Cloud Speech API 💬 enables developers 👩‍💻 to convert audio to text.

  • Transcribe your content in real time or from stored files
  • Deliver a better user experience in products through voice 🎤 commands
  • Gain insights from customer interactions to improve your service

Cloud Translation API🈴 provides a simple programmatic interface for translating an arbitrary string into a supported language.

  • Translate arbitrary strings between thousands of language pairs
  • Programmatically detect a document’s language
  • Support for dozens of languages

Cloud Video Intelligence API📹 enable powerful content discovery and engaging video experiences.

  • Annotate the contents of videos
  • Detect scene changes
  • Flag inappropriate content
  • Support for a variety of video formats

“Fly away, high away, bye bye…” 🦋

We will continue next week with Part II of this series….

Thanks –

–MCS

Week of September 25th

“Dynamite🧨 with a laser beam💥…Guaranteed to blow💨 your mind🧠

Happy National Lobster 🦞 Day!

“And here I go again on my own… Goin’ down the only road I’ve ever known”

This week we continued where we last left off the previous week as we continued exploring the depths of SQL Server 2019. Last week, we just merely scratched 💅 the surface of SQL Server 2019 as we dove🤿 into IQP and the improvements made to TempDB. This week we tackled Microsoft’s most ambitious SQL Server offering to date in SQL Server 2019 Big Data Clusters (BDC). When I first thought of BDCs the first thing that came to mind 🤔 was a Twix Bar 🍫. Yes, we all know Twix is the is the “only candy with the cookie crunch” but what makes the Twix bar so delicious 😋 is the perfect melding of smooth Chocolate, Chewy Carmel and of course crisp cookie🍪! Well, that’s exactly what Big Data Cluster is like… You’re probably thinking Huh?

Big Data Clusters (BDC) is MSFT’s groundbreaking new Big Data/Data Lake architecture that unifies virtual business data and operational data stored in relational databases with IoT for true real-time BI and embedded Artificial Intelligence (AI) and Machine Learning (ML). BDC combines the power⚡️of SQL Server, Spark 🔥, and the Hadoop Distributed File System (HDFS) 🐘 into a unified data platform. But that’s not all!  Since BDC runs natively on Linux🐧 it’s able to embrace modern architectures for deploying applications like Linux-based Docker🐧- 🐳 containers on Kubernetes ☸︎.

By Leveraging K8s ☸︎ for orchestration, deployments of BDCs are predictable, fast 🏃🏻, elastic🧘‍♀️ and scalable ⚖️. Seeing that Big data clusters can run any Kubernetes ☸︎ environment whether it be on-premise 🏠 (i.e. Red Hat OpenShift) or in the cloud☁️ (i.e. Amazon EKS); BDC makes a perfect fit to be hosted on Azure Kubernetes Service (AKS).

Another great feature that BDC makes use of is data virtualization, also known as “Polybase“. Polybase made its original debut with SQL Server 2016. It had seemed like Microsoft had gone to sleep😴 on it but now with SQL Server 2019 BDC, Microsoft has broken out big time! BDC takes advantage of Data Virtualization as it’s “data hub”. So, you don’t need to spend time 🕰 and expense💰 of traditional extract, transform, and load (ETL) 🚜 and Data Pipelines.  In addition, it lets organizations leverage existing SQL Server expertise and extract value from third-party data sources such as NoSQL, Oracle, Teradata and HDFS🐘.

Lastly, BDC takes advantage of Azure Data Studio (ADS) for both deployments and administration of BDC. For those who are not familiar with ADS, it’s a really cool 😎 tool 🛠 that you can benefit from by acquainting yourself with. Of course, SSMS isn’t going anywhere but with ADS you get a cross-platform database lightweight tool 🛠 that uses Jupyter notebooks📒 and Python🐍 scripting making deployments and administration of BDCs a breeze.  OK, I am ready to rock and roll 🎶!

“I love❤️ rock n’ roll🎸… So put another dime in the jukebox 📻, baby”

Before we can jump right into the deep end of the Pool 🏊‍♂️ with Big Data Cluster, we felt a need for a little primer on Virtualization, Kubernetes ☸︎, and Containers. Fortunately, we knew just who to deliver the perfect overview, no other then one of the prominent members of the Mount Rushmore of SQL Server, Buck Woody who through his guest appearance with Long time product team member and Technology Evangelist Sanjay Soni in the Introduction to Big Data Cluster on SQL Server 2019 | Virtualization, Kubernetes ☸︎, and Containers YouTube Video. Buck who has a very unique style and an amazing skill of taking complex technologies and making them seem simple. His supermarket 🛒 analogy to explain Big Data, Hadoop and Spark 🔥, virtualization, containers, and Kubernetes ☸︎ is pure brilliance! Now, armed 🛡🗡 with Buck’s knowledge bombs 💣 we were ready to light 🔥 this candle 🕯!

“Let’s get it started (ha), let’s get it started in here”

Taking us through BDC architecture and deployments was newly minted Microsoft Data Platform MVP Mohammad Darab who put together a series of super exciting videos as well as excellent detailed blog posts on BDC and Long Time SQL Server veteran and Microsoft Data Platform MVP Ben Weissman through his awesome Building Your First Microsoft SQL Server Big Data Cluster Pluralsight course.  Ben’s speculator course covers not only architecture and deployment but how to get data in and out of the, make the most of out of them and how to monitor, maintain, and troubleshoot BDC.

“If you have built castles 🏰 in the air, your work need not be lost; that is where they should be. Now put the foundations 🧱 under them.” – Henry David Thoreau

A big data cluster consists of several major components:

  • Controller (Control plane)
    • Master Instance – manage connectivity (endpoints and communication with the other Pools 🏊‍♂️), scale-out ⚖️ queries, metadata and user databases (target for Restore databases), and machine learning services.

Data not residing in your master instance will be exposed through the concept of External tables. Examples are CSV files on an HDFS store or Data on another RDBMS. External tables can be queried the same as local tables on the SQL Server with several cavorts:

  • Unable modify the table structure or content
  • No Indexes can be applied (only statistics are kept. SQL Server to houses that meta data.
  • Data source might require you to provide credentials 🔐.
  • Data source may also need a format definition like text qualifiers or separators for CSV files 🗃
  • Compute Pool 🏊‍♂️
  • Storage Pool 🏊‍♂️
  • Data Pool 🏊‍♂️
  • Application Pool 🏊‍♂️

The controller provides management and security for the cluster and acts as the control plane for the cluster. It takes care of all the interactions with K8s ☸︎, the SQL Server instances that are part of the cluster and other components like HDFS🐘, Spark🔥, Kibana, Grafana, and Elastic Search

The controller manages:

  • Cluster lifecycle, bootstrap, delete update, etc.
  • Master SQL Server Instance
  • Compute, data, and Storage Pools 🏊‍♂️
  • Cluster Security 🔐

The Compute Pool 🏊‍♂️ is a set a stateless multiple SQL Server 2019 instances that work together. The Compute Pool 🏊‍♂️ leverage Data Virtualization or Polybase to scale-out⚖️ queries across partitions. The Compute Pool 🏊‍♂️ is automatically provisioned as part of BDC. Management and Patching of Computer Pools 🏊‍♂️ is easy because they run on Docker 🐳 containers running on K8☸️ pods.

Please note: Queries on BDC can also function without the Compute Pool 🏊‍♂️.  

 Compute Pool 🏊‍♂️ is responsible for:

  • Joining of two or more directories 📂 in HDFS🐘 with 100+ files
  • Joining of two or more data sources
  • Joining multiple tables with different partitioning or distribution schemes
  • Data stored in Blob Storage

The Storage Pool 🏊‍♂️ consists of pods comprised of SQL Servers on Linux🐧 and Spark🔥 on HDFS 🐘 (deployed automatically). All the Nodes of the BDC are members of an HDFS🐘 Cluster. The Storage Pool 🏊‍♂️ stores file‑based 🗂 data like a CSV and queried directly through external tables and T‑SQL, or you can use Python🐍 and Spark🔥. If you already have an HDFS🐘 on either Azure Data Lake (ADLS) store or AWS S3 buckets🗑 you can easily mount your existing storage into your BDC without the need to shift all your data around.

The Storage Pool 🏊‍♂️ is responsible for:

  • Data ingestion through Spark🔥
  • Data Storage in HDFS🐘 (Parquet format). HDFS🐘 data is spread across all storage nodes in the BDC for persistency
  • Data access through HDFS🐘 and SQL Server Endpoints

The Data Pool 🏊‍♂️ is used for data persistence and caching. Under the covers the Data Pool 🏊‍♂️ is a set of SQL Servers (Defined at Deployment) that are using Columnstore Index and Sharding. In other words, SQL Server will create physical tables with same structure and evenly distribute the data across the total number of servers. The Queries will be also be distributed across server but to the user it will be transparent as all the magic✨🎩 is happening behind the scenes. The data stored in the data Pool 🏊‍♂️ does not support transactions as its sole purpose is for caching. It is used to ingest data from SQL Queries or Spark🔥 Jobs. BDC data marts are persisted in the data Pool 🏊‍♂️.

Data Pool 🏊‍♂️ is used for:

  • Complex Query Joins
  • Machine Learning
  • Reporting

The application Pool 🏊‍♂️ is used to run jobs like SSIS, store and execute ML models, and all kinds of other applications which are generally exposed through a web service.

“This is 10% luck🍀… 20% skill… 15% concentrated power🔌 of will… 5% pleasure😆… 50% pain🤕… And a 100% reason to remember the name”

After both great overviews of the BDC architecture by both Mo and Ben, we were eager to build this spaceship 🚀. First, we need to download the required tools 🛠🧰 so we can happily😊 have our base machine where we can deploy our first BDC. I have chosen the Mac 💻 just to spice 🥵 things up as my base machine.

Below are the Required Tools:

  • Azure Data Studio (ADS)
  • ADS Extension for Data Virtualization
  • Python 🐍
  • Kubectl ☸️
  • azdata
  • Azure-CLI (and only required if using AKS)

Optional Tools:

  • Text Editor 🗒
  • PowerShell Extension for ADS
  • Zip utility Software 🗜
  • SQL Server Utilities
  • SSH

Now that we got all our prerequisites downloaded, we next needed to determine where we should deploy our BDC. The most natural choice seemed to be with AKS.

To help walk🚶‍♂️us through the installation of our base machine and the deployment of BDS using ADS on AKS, we once again turned to Mo Darab who put together an excellent and easy to follow along series of videos Deploying Big Data Clusters on his YouTube channel.

Base Machine

In his video “How to set Up a Base Machine”, Mo used a Windows machine opposed to us who went with the Mac 💻 . But for all intents and purposes the steps are pretty much the same. The only difference is the package📦 manager that we need to use. Windows the recommended package manager is Chocolatey 🍫 while on the Mac 💻 its Brew 🍺.

Here were the basic steps:

§  brew install kubernetes-cli
§  brew update && brew install azure-cli
  • Install Python
    • brew install python3
  • Install azdata
    • brew tap microsoft/azdata-cli-release
    • brew update
    • brew install azdata-cli
  • Install ADS Extension for Data Virtualization
  • Install Pandas (manage package option in ADS/ Add New Pandas/ Click Install)

“Countdown commencing, fire one”

Now, we had our base machine up and running 🏃‍♂️, it was time ⏱ to deploy. To guide us through the deployment process, we once again went to Mo and followed along in his How to Deploy Big Data Cluster on AKS using Azure Data Studio. Mo walked us through the wizard 🧙‍♂️ in ADS which basically creates a Notebook 📒 that builds our BDC. We created our deployment Jupyter notebook 📒 and then clicked Run 🏃‍♂️ and let it rip☄️. Everything seemed to be humming 🎶 along except 2 hours later our Jupyter notebook was still running 🏃‍♂️.

Obviously, something wasn’t right? 🤔 Unfortunately, we didn’t’ have much visibility on what’s happening with the install through the notebook but hey that’s ok we have a terminal prompt in ADS. So we can just run some K8☸️ commands to see what’s happening under the covers. Ok, so after running a few K8 commands we noticed “ImagePullBackOff” error with our SQL Server Images. After a little bit research, we determined someone forgot 🙄to update the Microsoft Repo with the latest CU Image.

“But there is no joy in Mudville—mighty Casey has struck out.”

So we filled a bug 🐞 on GitHub and we ran the BDC wizard 🧙‍♂️ changed the Docker🐳 settings pointing to the next latest package available on the Docker🐳 Registry and we were back in business until… And then Bam!🥊🥊

Operation failed with status: 'Bad Request'. Details: Provisioning of resource(s) for container service mssql-20200923030840 in resource group mssql-20200923030840 failed. Message: Operation could not be completed as it results in exceeding approved Total Regional Cores quota.

What shall we do? What does our Azure Fundamentals training tells us to do? That’s right we go to Azure Portal and submit a support ticket and beg the good people at Microsoft to increase our Cores Quota. So, we did that and almost instantaneously MSFT quick obliged. 😊

Great, we are back in business (Well sort of).. After several more attempts we ran into more Quota issues with all three types (SKU, Static, and Basic) of Public IP Addresses . So, 3 more support tickets later and there was finally joy 😊 to the world🌎.  

Next, we turned back to Ben who provided some great demos in his Pluralsight course on how to set up Data Virtualization for both a SQL Server and HDFS 🐘 files as a data sources on BDC

Data Virtualization (SQL Server)

  1. Right‑mouse click ->Virtualize Data
  2. Data virtualization wizard 🧙‍♂️ will launch.
  3. Next step chooses data source
  • Create Master Key
  • Create connection and specify username and password
  • Next, the wizard🧙‍♂️ will connect to the source and a list of the tables and views
  • Choose the script option
  • Click on a specific table(s)

The external table inherits a schema from its source, and transformation would happen in your queries.

You can choose between just having them created or to generate a script.

CREATE EXTERNAL TABLE [virt].[PersonPhone]

        (

            [BusinessEntityID] INT NOT NULL,

            [PhoneNumber] NVARCHAR(25) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

            [PhoneNumberTypeID] INT NOT NULL,

            [ModifiedDate] SMALLDATETIME NOT NULL

        )

        WITH (LOCATION = N'[AdventureWorks2015].[Person].[PersonPhone]’, DATA_SOURCE = [SQLServer]);

  • Data Virtualization (HDFS🐘)
  1. Right‑mouse‑click your HDFS 🐘 and create a new directory
  2. Upload the flight delay dataset to it.
  3. Expand the directory 📁 to see files 🗂
  • Right‑mouse‑click a file launches a wizard 🧙‍♂️ to virtualize data from CSV files
  • The wizard 🧙‍♂️will ask you for the database in which you want to create the external table, a name for the data source;
  • Next step preview of our data,
  • next step, the wizard 🧙‍♂️ recommends a column type

  CREATE EXTERNAL DATA SOURCE [SqlStoragePool]

            WITH (LOCATION = N’sqlhdfs://controller-svc/default’);

        CREATE EXTERNAL FILE FORMAT [CSV]

            WITH (FORMAT_TYPE = DELIMITEDTEXT, FORMAT_OPTIONS (FIELD_TERMINATOR = N’,’, STRING_DELIMITER = N'”‘, FIRST_ROW = 2));

        CREATE EXTERNAL TABLE [csv].[airlines]

        (

            [IATA_CODE] nvarchar(50) NOT NULL,

            [AIRLINE] nvarchar(50) NOT NULL

        )

        WITH (LOCATION = N’/FlightDelays/airlines.csv’, DATA_SOURCE = [SqlStoragePool], FILE_FORMAT = [CSV]);

Monitoring 🎛 Big Data Clusters through Azure Data Studio

BDC comes with a pre-deployed Grafana container for SQL Server and system metrics. This map server collects all those metrics from every single node, container, and pod and provides them individual dashboards 📊.

The Kibana dashboard 📊 is part of the Elastic Stack and provides a looking glass🔎 into all of your log files in BDC.

Troubleshooting Big Data Clusters through Azure Data Studio

In ADS, on the main dashboard 📊 there is a button Troubleshoot. This provides a library📚 of notebooks 📒 that you can use to troubleshoot and analyze the cluster. Notebooks 📒 are categorized and provide all kinds of different aspects on monitoring 🎛 of a diagnosing to help you repair 🛠 an issue within your cluster.

In addition, the azdata utility can be monitoring 🎛, running queries and notebooks 📒, and retrieve a cluster’s endpoints. the namespace and username as well.

We really enjoyed spending time learning SQL Server 2019 Big Data Cluster. 😊

“And I feel like it’s all been done Somebody’s tryin’ to make me stay You know I’ve got to be movin’ on”

Below are some of the destinations I am considering for my travels for next week:

  • Google Cloud Certified Associate Cloud Engineer Path

Thanks –

–MCS

Week of June 5th

Another dimension, new galaxy Intergalactic, planetary”

Happy National Donut Emoji Day!

“There is an inexorable force in the cosmos, a place where time and space converge. A place beyond man’s vision…but not beyond his reach. Man has reached the most mysterious and awesome corner of the universe…a point where the here and now become forever…. A journey that takes you where no man has been before Experience the power⚡️! A journey that begins where everything nothing ends! You can’t escape the most powerful force in the ‘DevOps’ universe.”

Mission #7419I 

So once again, we boarded the USS Palomino 🚀 and continued our exploration to the far depths of the DevOps Universe.  Just to pick up where we last left off,👨‍✈️ Captain Bret Fisher had taken us through the Microservices galaxy 🌌  and straight to Docker🐳 and Containers.  But.. “with so many light years to go.. And things to be found” we continued through the courseware  Docker Mastery: with Kubernetes +Swarm from a Docker Captain  and reconnoitered Docker🐳 Compose, Docker🐳 Swam, Docker🐳 Registries, and the infamous Kubernetes☸️.

Again, we leveraged the portability of HashiCorp’s Vagrant for both Docker🐳 with Docker🐳 Compose, our 3 Node Docker 🐳 Swarm , and the K8s☸️ environments. We were grateful that we had our previous experiences with Vagrant in earlier learnings as it made standing up these environments quite seamless.

We started off with Docker 🐳 Compose which can be quite a useful tool in development for defining and running multi container Docker🐳 applications.  Next, we headed right over to Docker🐳 Swam to get our initiation into Container Orchestration. You might ask why not just go straight Kubernetes☸️ as they are the clear winner🏆 from famous Container Orchestration wars? Well, Orchestration is great for solving complex problems but Orchestrators themselves can be complex solutions to try to learn. From what we witnessed this week we were glad we started there.  We also learned that the “combination of Docker 🐳 Swarm, Stacks, and Secrets are kind of like a trilogy of awesome features” that can really make things easier if we went this route in production.

“Resistance is Futile“

If you not familiar with the story of Kubernetes☸️ or affectionately known as k8s☸️.. It came out of Google by the original developers who worked on the infamous Google “Borg” project.. In fact, here is a little bit of trivia, the code name project for K8s☸️ was called Project Seven of Nine, a reference to the Star Trek🖖 character of the same name who was a “friendlier” Borg. K8s☸️ was certainly uncharted territories for us and bit out of my purview but it was a good learning experience to to get a high level overview of another important component of the infrastruture ecosystem.

Captains’ log  Star date 73894.9 These are missions covered in earnest this week:

  • Created a 3-node Swarm cluster in the cloud
  • Installed Kubernetes and and learn the leading server cluster tools
  • Used Virtual IP’s for built-in load balancing in your cluster
  • Optimized Dockerfiles for faster building and tiny deploys
  • Built/Published custom application images
  • Learned the differences between Kubernetes and Swarm
  • Created an image registry
  • Used Swarm Secrets to encrypt your environment configs
  • Created the config utopia of a single set of YAML files for local dev, CI testing, and prod cluster deploys
  • Deployed apps to Kubernetes
  • Made Dockerfiles and Compose files
  • Built multi-node Swarm clusters and deploying H/A containers
  • Made Kubernetes YAML manifests and deploy using infrastructure-as-code methods
  • Built a workflow of using Docker in dev, then test/CI, then production with YAML

For more details see the complete Log

This turned out to be quite the intensive undertaking this week but we accomplished our mission and here is certificate to prove it

Below are some topics I am considering for my exploration next week:

  • Google Big Query
  • More with with Data Pipelines
  • Google Cloud Data Fusion (ETL/ELT)
  • More with with Data Pipelines
  • NoSQL – MongoDB, Cosmos DB
  • Working JSON Files
  • Working with Parquet files 
  • JDBC Drivers
  • More on Machine Learning
  • ONTAP Cluster Fundamentals
  • Data Visualization Tools (i.e. Looker)
  • ETL Solutions (Stitch, FiveTran) 
  • Process and Transforming data/Explore data through ML (i.e. Databricks)

Stay safe and Be well –

–MCS