awscloud | SQL Squirrels

Week of August 28th

Posted on August 28, 2020 by Mark Shay

Taking care of business (every way)

Happy National Bow Tie 🎀Day!

So last week, we dusted ourselves off, climbed 🧗‍♂️ back up on our horse 🐎, and reconvened with our continuous learning adventures. To get ourselves warmed🔥 up we decided to renew our knowledge with PowerShell ⚡️🐚 but we still had some unfinished business to attend too. If you have been following our travels 🚀, you might remember earlier this month we hit a bump in the road. But just like the armless Black Knight 🛡🗡 from Monty Python and the Holy Grail it was just a mere “flesh wound”. Ok, perhaps not. Actually, at the time 🕰️ it stung 🐝 pretty bad 😞 but after a two-week investigation by the good folks at AWS ☁️ it was deemed that my test on the 4^th of August had indeed got corrupted. They offered their sincere apology for the inconvenience and more importantly provided me a voucher so I can reschedule the AWS Cloud Practitioner Exam at no cost. Well, this week I decided to re-take the exam 📝 and I am happy 😊 to report I passed the exam 📝 without issue.

To help prepare for the exam📝 I purchased the Udemy – AWS Certified Cloud Practitioner 500 Practice Exam Questions. There were some similar questions taken from this course’s practice tests but it seems AWS☁️ likes to keep their certified professionals honest. So there were quite a few questions I have never seen before.

So it’s highly recommend that in addition to gaining practical experience working with AWS that you also review there courseware and fully understand the core concepts like the AWS Well-Architected Framework and have a good basic understanding of many of the AWS Products and services

Despite the obstacle earlier this month, it was good experience preparing for the exam and ultimately passing and getting the certification. Now, we are even more well reversed in the cloud ☁️ and have ~~street~~ cloud ☁️ credit to back it up. 😎

“And I’m on my way… I don’t know where I’m going 🤷🏻‍♂️… I’m on my way… I’m taking my time ⏳… But I don’t know where?”

Below are some areas I am considering for my travels next week:

Azure Fundamentals Courseware

Thanks –

–MCS

Week of July 31st

Posted on July 31, 2020 by Mark Shay

A new day will dawn 🌄… For those who stand long”

Happy National Avocado🥑 Day!

Our journey 🚞 this week takes us back to our humble beginnings. Well, sort of… If you recall we began our magical✨ mystery tour of learnings back in March with AWS EC2. Then the last 2 weeks we re-routed course back to AWS, concentrating on AWS’s data services. So, we thought it might make sense to take one step 👣 back in order to take two steps 👣 👣 forward by focusing this week’s enlightenments on the fundamentals of the AWS Cloud☁️ and its Key🔑 concepts, core services, security🔐, architecture, pricing 💶, and support.

Fortunately, we knew the right place to load up on such knowledge. Where of course you ask? But to no other than the fine folks at AWS Training through their free online course AWS Cloud☁️ Practitioner Essentials (Second Edition). AWS spared no expense💰 by putting together an all-star🌟 lineup of AWS-er’s led by Kirsten Dupart, an old familiar friend, Blaine Sundrud ,Mike Blackmer,Raf Lopes, Heiwad Osman, Kent Rademacher , Russell Sayers ,Seph Robinson , Andy Cummings , Ian Falconer ,Wilson Santana ,Wes Gruver, Tipu Qureshi, and Alex Buell

The objective of the course was to highlight the following main areas:

AWS Cloud☁️ global infrastructure
AWS Cloud☁️ architectural principles
AWS Cloud☁️ value proposition
AWS Cloud☁️ overview of security🔐 and compliance
AWS Cloud☁️ overview of billing, account management, and pricing 💶 models

The course beings with introduction to the concept of “Cloud☁️ Computing” which of course is the on-demand availability of computing system resources, data Storage 🗄 and computing power⚡️, without direct active management by the user. Instead of having to design and build traditional data centers, Cloud☁️ computing enables us to access a data center and all of its resources, via the Internet or Cloud☁️.

Amazon Web Services (AWS) is a secure🔐 Cloud☁️ services platform, offering compute power⚡️, database Storage 🗄, content delivery and other functionality to help businesses to scale⚖️ up or scale⚖️ down based on actual needs. There are 5 main areas that AWS Cloud☁️ emphases Scalability ⚖️, Agility, Elasticity🧘‍♂️, Reliability and Security🔐.

Scalability ⚖️ is the ability to resize your resources as necessary. AWS Cloud☁️ provides a scalable computing platform designed for high availability and dependability through tools and solutions.

Agility is the ability to increase speed🏃🏻, offer an ease of experimentation and promoting innovation. AWS empowers the user to seamlessly spin up servers in minutes, shut down servers when not needed or allow unused resources to be allocated for other purposes

Elasticity🧘‍♂️ is the ability to scale ⚖️ computing resources up or down easily. AWS makes it easy to quickly deploy new applications, scale⚖️ up as the workloads increase and shut down resources that are no longer required

Reliability is the ability of a system to recover from infrastructure or service failure. AWS provides reliability by hosting your instances and resources across multiple locations utilizing regions, availability zones and edge locations.

Security 🔐 is the ability to retain complete control and ownership over your data and meet regional compliance and data residency requirements. AWS provides highly secure 🔐 data centers, continuous monitoring 🎛 and industry-leading capabilities across facilities, networks, software, and business processes

There are three methods in which you can access AWS resources:

AWS management console which provides a graphical user interface (GUI) to access AWS services
AWS command line interface (CLI) which allows you to control AWS services from the command line
AWS Software Development kits (SDK) enables you to access AWS using a variety of programming languages

Next the course provides us with some brief vignettes covering the AWS core services, AWS Global Infrastructure, and AWS Integrated Services.

AWS Core Services

Elastic Compute Cloud☁️ (EC2) is a web service that provides secure, resizable compute capacity in the Cloud☁️. EC2 instances are “pay as go”. You only pay for the capacity you use, and you have the ability to have different Storage 🗄 requirements.

Key🔑 components of EC2 are:

Amazon machine image (AMI) which is an OS image used to build an instance
Instance Type refers to hardware capabilities (CPU, Memory)
Network – Public and Private IP addresses
Storage 🗄 – SSDs, Provisioned IOPs SSD, Magnetic disks
Keypairs🔑 (Secure) allow you to connect to instances after they are launched
Tags 🏷 provide a friendly name to identify resources in an AWS.

Elastic block store (EBS) provides persistent block level Storage🗄 volumes for your EC2 instances

EBS volumes are designed for being durable and available volumes that are automatically replicated across multiple servers running in the availability zones.
EBS Volumes must be in the same AZ as the instances they are attached to
EBS gives you the ability to create point in time⏰ snapshots of your volumes and allows AWS to create a new volumes from a snapshot at any time⏰.
EBS volumes have the ability to increase capacity and change to different types
Multiple EBS volumes can be attached to an instance

Simple Storage🗄 Service (S3) is a fully managed Storage🗄 service that provides a simple API for storing and retrieving data. S3 uses buckets🗑 to store data. S3 buckets🗑 are associated with a particular AWS region When you store data in a bucket🗑 it’s redundantly stored across multiple AWS availability zones within a given region

Data stored in S3 is serverless. So, you do not need to manage any infrastructure.
S3 supports objects as large as several terabytes.
S3 also provides low latency access to data over HTTP or HTTPS

AWS Global Infrastructure

The AWS Global Infrastructure consists of Regions, Availability Zones, and Edge locations providing highly available, fault tolerant, and scalable infrastructures.

AWS regions are multiple geographical 🌎 areas that host two or more availability zones and are the organizing level for AWS services.

Availability zones are a collection of data centers within a specific region. Each availability zone is physically isolated from one another but connected together through low latency, high throughput, and highly redundant networking. AWS recommends provisioning your data across multiple availability zones.

As of April 2020, AWS spans 70 Availability Zones within 22 Regions around the globe 🌎.

Edge locations are where end users access services located at AWS. They are located in most of the major cities 🏙 around the world 🌎 and are specifically used by Amazon CloudFront🌩 which is a content delivery network (CDN) that distributes content to end user to reduce latency.

Amazon Virtual Private Cloud⛅️ (VPC) is a private network within the AWS Cloud☁️ that adheres to networking best practices as well as regulatory and organizational requirements. VPC is an AWS foundational service that integrates with many of the AWS services. VPC leverages the AWS global infrastructure of regions and availability zones. So, it easily takes advantage of high availability provided by AWS. VPC exists within regions and can span across multiple availability zones. You can create multiple subnets in a VPC. Although fewer is recommended to limit the complexity of the network.

Security🔐 Groups acts as a virtual 🔥 firewall for your virtual servers to control incoming and outgoing traffic🚦. It’s another method to filter traffic🚦 to your instances. It provides you control on what traffic🚦 to allow or to deny. To determine who has access to your instances you would configure a Security🔐 group rule.

AWS CloudFormation🌨 “Infrastructure as Code” allows you to use programming languages, JSON files, or simple text files; to model and provision, in an automated and secure manner, all the resources needed for your applications across all regions and accounts.

“Every AWS service that you learned about is another tool 🛠to build solutions. The more tools⚒ you can bring to the table, the more powerful 💪 you become.” -Andy Cummings

AWS Integrated Services

AWS offers a variety of services from A-Z. So, it would impossible to review every service in a six-hour course. Below are some of the services highlighted in the course:

Elastic Load Balancing 🏋🏻‍♀️ distributes incoming application traffic🚦across multiple AWS services like EC2 instances, containers, IP addresses, and Lambda functions automatically. There are 3 kinds of load balancers Network Load Balancer🏋🏻‍♂️, Classic Load Balancer 🏋🏻‍♂️ (ELB) and Application Load Balancer🏋🏻‍♂️(ALB).

Network Load Balancer 🏋🏻‍♀️ is best suited for load balancing of TCP, UDP and TLS traffic🚦 where extreme performance is required.
Classic Load Balancer 🏋🏻‍♀️ provides basic load balancing across multiple Amazon EC2 instances and operates at both the request level and connection level.
Application Load Balancer 🏋🏻‍♀️ offers most of the features provided by the classic load Balancer 🏋🏻‍♀️ and adds some important features and enhancements. Its best suited for load balancing of HTTP and HTTPS traffic🚦

AWS Autoscaling⚖️ monitors 🎛 your applications and automatically adjusts capacity to maintain steady, predictable performance at the lowest possible cost. Autoscaling⚖️ removes the guesswork of how many EC2 instances you need at a point in time⏰ to meet your workload requirements. Their are three core components that are need at launch configuration, “the what to deploy”; An Autoscaling⚖️ group (“the where to deploy”) and an Autoscaling⚖️ policy (“when to deploy”).

Dynamic auto scaling⚖️ is a common configuration used with AWS CloudWatch⌚️alarms based on performance information from your EC2 instance or load balancer🏋🏻‍♂️.

Autoscaling⚖️ and Elastic load balancing 🏋🏻‍♀️ automatically scale⚖️ up or down based on demands backed by Amazon’s massive infrastructure you have access to compute and Storage 🗄 resources whenever you need them

Amazon route 53👮is a global, highly available DNS service that allows you to easily register and resolve DNS names, providing a managed service of reliable and highly scalable ⚖️ way to route 👮‍♀️ end users to Internet applications. Route 53 offers multiple ways to route👮‍♀️ your traffic🚦 enabling you to optimize latency for your applications and users.

Amazon Relational Database Service (RDS🛢) is a database as a service (DBaaS) that makes provisioning, operating and scaling⚖️ either up or out seamless. In addition, RDS🛢makes other time-consuming administrative tasks such as patching, and backups a thing of the past. Amazon RDS🛢provides high availability and durability through the use of Multi-AZ deployments. It also lets you run your database instances an Amazon VPC, which provides you the control and security🔐.

AWS Lambda is a compute service that lets you run code without provisioning or managing servers. AWS Lambda executes your code only when needed and scales⚖️ automatically to thousands of requests per second.

AWS Elastic Beanstalk🌱 is an easy-to-use service for deploying and scaling web applications and services developed with Java☕️ , NET, PHP, Node.js, Python🐍, Ruby💎, Go, and Docker🐳 on familiar servers such as Apache, Nginx, Passenger, and IIS. Elastic Beanstalk🌱 employs Auto Scaling⚖️ and Elastic Load Balancing🏋🏻‍♂️ to scale⚖️ and balance workloads. It provides tools🛠 in the form of Amazon CloudWatch⌚️ to monitor 🎛 the health❤️ of deployed applications. It also provides capacity provisioning due to its reliance on AWS S3 and EC2.

Amazon Simple Notification Service (SNS) 📨 is a highly available, durable, secure🔐, fully managed pub/sub messaging service like Google’s pub/sub that enables you to decouple microservices, distributed systems, and serverless applications. Additionally, SNS📨 can be used to fan out notifications to end users using mobile push, SMS, and email✉️.

Amazon CloudWatch⌚️ is a monitoring 🎛 service that allows you to monitor 🎛 AWS resources and the applications you run 🏃🏻 on them in real time. Amazon CloudWatch⌚️ features include collecting and tracking metrics like CPU utilization, data transfer, as well as disk I/O and utilization. Some of the components that make up Amazon CloudWatch⌚️ include metrics, alarms, events, logs and dashboards

Amazon CloudFront🌩 uses a global 🌎network of more than 80 locations and more than 10 regional edge caches for content delivery (CDN). It’s integrated with the AWS services such as AWS web application🔥 firewall, certificate manager, route 53, and S3 as well as other AWS services.

AWS CloudFormation🌨 is a fully managed service which acts as an engine 🚂 to automate the provisioning of AWS resources. CloudFormation🌨 reads template files which specify the resources to deploy. Provision resources are known as the stack. Stacks 📚 can be created updated or deleted through CloudFormation🌨.

AWS Architecture

When one refers to AWS Architecture one need to refer to no further than to the AWS Well-Architected Framework. The AWS Well-Architected Framework originally began as a single whitepaper but expanded into more of a doctrine focused on Key🔑 concepts, design principles, and architectural best practices for designing secure, high-performing, resilient, and efficient infrastructure and running 🏃🏻 workloads in the AWS Cloud☁️ .

The AWS Well-Architected Framework is based on five pillars; operational excellence, security🔐 , reliability, performance efficiency, and cost optimization.

Operational excellence focuses on running 🏃🏻 and monitoring 🎛 systems and continually improving processes and procedures.
Security🔐 centers on protecting information and systems.
Reliability highlights that workload performs consistently as intended and could quickly recover from failure
Performance efficiency concentrates on efficiently using computing resources.
Cost optimization emphasis on cost avoidance.

Reference Architecture – Fault Tolerance and High Availability

Both Fault Tolerance and High Availability are cornerstones of infrastructure design strategies to keep critical applications and data up and running 🏃🏻

Fault Tolerance refers to the ability of a system (computer, network, Cloud☁️ cluster, etc.) to continue operating without interruption when one or more of its components fail.

High availability refers to systems that are durable and likely to operate continuously functioning and accessible and that downtime is minimized as much as possible, without the need for human intervention.

AWS provides services and infrastructure to build reliable, fault-tolerant, and highly available systems in the Cloud☁️.

Some AWS services that can assist in providing high availability:

Elastic load balancers🏋🏻‍♀️
Elastic IP addresses
Route 53 👮‍♀️
Auto scaling⚖️
CloudWatch⌚️

Some AWS services that provide fault tolerant tools are:

SQS
S3 🗄
RDS🛢

Amazon Web Services offers Cloud☁️ web 🕸 hosting solutions that provide businesses, non-profits, and governmental organizations with low-cost ways to deliver their websites and web 🕸 applications.

Security🔐

When it comes to security🔐 AWS doesn’t take this lightly. So much so that when you are a newbie to AWS it could be quite challenging just to connect to your Cloud☁️ environment. AWS global infrastructure is built with the highest standards to ensure privacy🤫 and data security🔐. AWS infrastructure has strong 💪 safeguards in place to protect customers privacy 🤫. AWS continuously improves and innovates security🔐 incorporating customer feedback and changing requirements. AWS provides security🔐 specific tools🛠 and features across network security🔐, configuration management, access control, and data security🔐. AWS provides monitoring 🎛 and logging tools🛠 to provide full visibility👀 into what is happening in your environment. AWS provides several security🔐 capabilities and services like built-in firewalls🔥 to increase privacy🤫 and control network access. In addition, AWS offers Encryption of data both in transit and data at rest in the Cloud☁️. AWS offers you capabilities to define, enforce, and manage user👤 access policies across AWS services.

The shared👫responsibility model

AWS believes Security🔐 and Compliance is a shared👫responsibility between AWS and the customer. The shared👫responsibility model alleviates the operational burden as AWS operates, manages and controls the components from the host operating system and virtualization layer down to the physical security🔐 of the facilities in which the service operates. The customer assumes responsibility and management of the guest operating system (including updates and security🔐 patches), other associated application software as well as the configuration of the AWS provided security🔐 group🔥 firewalls.

Security🔐 “of” the Cloud☁️ vs Security🔐 “in” the Cloud☁️

“Security🔐 of the Cloud☁️” – AWS is responsible for protecting the infrastructure that runs all of the services offered in the AWS Cloud☁️.
“Security🔐 in the Cloud☁️” – Customer responsibility will be determined by the AWS Cloud☁️ services that a customer selects.

Inherited Controls are controls that the customer fully inherits from AWS.

Shared👫Controls are controls which apply to both the infrastructure layer and the customer layers, but in completely separate contexts or perspectives. Examples include:

Patch Management – AWS is responsible for patching and fixing flaws within the infrastructure, but customers are responsible for patching their guest OS and applications.
Configuration Management – AWS maintains the configuration of its infrastructure devices, but a customer is responsible for configuring their own guest operating systems, databases, and applications.
Awareness & Training – AWS trains AWS employees, but a customer must train their own employees.

Customer Specific – Controls which are solely the responsibility of the customer based on the application they are deploying within AWS services. Examples include:

AWS Cloud☁️ Security🔐 Infrastructure and services

AWS Identity and Access Management (IAM) is one of the core secure services (at no additional charge) to enforce security🔐 across all AWS Service offerings. IAM provides Authentication, Authorization, User Management and Central User Repository. In IAM, you can create and manage users, groups, and roles to either allow or deny access to specific AWS resources.

Users👤 are permanent named Operator (can be either human or machine)
Groups 👥 are collections of users (can be either human or machine)
Roles – are an authentication method. The Key🔑 part is the credentials with the role are temporary.

As for permissions, it is enforced by a separate object known as a policy document 📜.

A Policy document📜 is JSON document📜 that attaches either directly to a user👤 or a group👥 or it can be attached directly to a role.

AWS CloudTrail 🌨 is a service that enables governance, compliance, operational auditing, and risk auditing CloudTrail🌨 records all successful or declined authentication and authorization.

Amazon Inspector 🕵️‍♂️ is an automated security🔐 and vulnerability assessment service that assesses applications for exposure, vulnerabilities, and deviations from best practices. Amazon Inspector produces a detailed list of security🔐 findings prioritized by level of severity in the following areas:

Identify Application Security🔐 Issues
Integrate Security🔐 into DevOps
Increase Development Agility
Leverage AWS Security🔐 Expertise
Streamline Security🔐 Compliance
Enforce Security🔐 Standards

AWS Shield🛡 is a managed Distributed Denial of Service (DDoS) protection service. There are two tiers of AWS Shield🛡 Standard and Advanced.

AWS Shield🛡Standard defends against most common, frequently occurring network and transport layer DDoS attacks that target your web site or applications.
AWS Shield🛡Advanced provides additional detection against large and sophisticated DDoS attacks, near real-time visibility into attacks, and integration with AWS web 🕸application 🔥 firewall (WAF).

Pricing and Support

AWS offers a wide range of Cloud☁️ computing services. For each service, you pay for exactly the amount of resources you actually use.

Pay as you go.
Payless then when you reserve.
Pay even less per unit by using more
Pay even less as your AWS Cloud☁️ grows

There are three fundamental characteristics you pay for with AWS:

Compute 💻
Storage 🗄
Data transfer out ➡️

Although you are charged for data transfer out, there is no charge for inbound data transfer or for data transfer between other services with the same region.

AWS Trusted Advisor is an online tool🔧 that optimizes your AWS infrastructure, increase reliability, security🔐 and performance, reduce your overall costs, and monitoring 🎛. AWS Trusted Advisor enforces AWS best practices in five categories:

Cost optimization
Performance
Security🔐
Fault tolerance
Service limits

AWS offers 4 levels of Support

Basic support plan (Included with all AWS Services)
Developer support plan
Business support plan
Enterprise support plan

Obviously, there was a lot to digest😋 but now we have a great overall understanding of the AWS Cloud☁️ concepts, some of the AWS services, security🔐, architecture, pricing 💶, and support and feel confident to continue our journey in the AWS Cloud☁️. 😊

“This is the end, Beautiful friend.. This is the end, My only friend, the end”?

Below are some areas I am considering for my travels next week:

Neo4J and Cypher
More with Google Cloud Path
ONTAP Cluster Fundamentals
Data Visualization Tools (i.e. Looker)
Additional ETL Solutions (Stitch, FiveTran)
Process and Transforming data/Explore data through ML (i.e. Databricks)

Thanks –

–MCS

Week of July 24th

Posted on July 24, 2020 by Mark Shay

“And you may ask yourself 🤔, well, how did I get here?” 😲

Happy Opening⚾️ Day!

Last week, we hit a milestone of sorts with our 20th submission🎖since we started our journey way back in March.😊 To commemorate the occasion, we made a return back to AWS by gleefully 😊 exploring their data ecosystem. Of course, trying to cover all the data services that are made available in AWS in such a short duration 🕰 would be a daunting task.

So last week, we concentrated our travels to three of their main offerings in the Relational Database, NoSQL, and Data warehouse realms. This being of course RDS🛢, DynamoDB🧨, and Redshift🎆. We felt rather content 🤗 and enlighten💡with AWS’s Relational Database and Data warehouse offerings, but we were still feeling a little less satisfied 🤔 with NoSQL as we really just scratched the surface on what AWS had to offer.

To recap, we had explored 🔦 DynamoDB🧨 AWS’s multi-model NoSQL service which offers support for a key-value🔑and their propriety document📜 database. But we were still curious to learn more about a Document📜 database that offers MongoDB🍃support as well in AWS. In addition, an answer to the hottest🔥 trend in “Not Just SQL Solutions”, Graph📈 Database.

Well of course being the Cloud☁️ Provider that offers every Cloud☁️native service from A-Z, AWS delivered with many great options. So we began our voyage heading straight over to DocumentDB📜. AWS’s fully managed database service with MongoDB🍃compatibility. As with all AWS services, Document DB📜 was designed from the ground up to give the most optimal performance, scalability⚖️, and availability. DocumentDB📜 like the Cosmo DB🪐 MongoDB🍃API makes it easy to set up, operate, and scale MongoDB-compatible databases. In other words, no code changes are required, and all the same drivers can be utilized by existing legacy MongoDB🍃applications.

In addition, Document DB📜 solves the friction and complications of when an application tries to map JSON to a relational model. DocumentDB📜 solves this problem by making JSON documents a first-class object of the database. Data is stored in the form of documents📜. These documents📜 are stored into collections. Each document📜can have a unique combination and nesting of fields or key-value🔑 pairs. Thus, making querying the database faster⚡️, indexing more flexible, and repetitions easier.

Similar to other AWS Data offerings, the core unit that makes up DocumentDB📜 is the cluster. A cluster consists of one or more instances and cluster storage volume that manages the data for those instances. All writes📝 are done through the primary instance. All instances (primary and replicas) support read 📖 operations. The cluster’s data stores six copies of your data across three different Availability Zones. AWS easily allows you to create or modify clusters. When you modify a cluster, AWS is really just spinning up a new cluster behind the curtains and then migrates the data taking what is an otherwise complex task and making it seamless.

As prerequisite, you first must create and configure a virtual private cloud☁️ (VPC) to place DocumentDB📜 in. You can leverage an existing one or you can create a dedicated one just for DocumentDB📜. Next, you need to configure security🔒 groups for your VPC. Security🔒 groups are what controls who has access to your Document📜 Databases . As for credentials🔐 and entitlements in DocumentDB📜, it is managed through AWS Identity and Access Management (IAM).By default, the cluster Document DB📜accepts secure connections using Transport Layer Security (TLS). So, all traffic in transit is encrypted and Amazon DocumentDB📜 uses the 256-bit Advanced Encryption Standard (AES-256) to encrypt your data or allows you to encrypt your clusters using keys🔑 you manage through AWS Key🔑Management Service (AWS KMS) so data at rest is always encrypted.

“Such wonderful things surround you…What more is you lookin’ for?”

Lately, we have been really digging Graph📈 Databases. We had our first visit with Graph📈 Databases when we were exposed to the Graph📈 API through Cosmos DB🪐 earlier this month and then furthered our curiosity through Neo4J. Well, now armed with a plethora of knowledge in the Graph📈 Database space we wanted to see what AWS had to offer and once again they did not disappoint.😊

First let me start by writing, It’s a little difficult to compare AWS Neptune🔱 to Neo4J although Nous Populi from Leapgraph does an admirable job. Obviously, both are graph📈 databases but architecturally there some major differences in their graph storage model and query languages. Neo4J uses Cypher and Neptune🔱 uses Apache TinkerPop or Gremlin👹 same as Cosmos DB🪐 as well as SPARQL. Where Neptune🔱 really shines☀️ is that it’s not just another graph database but a great service offering within the AWS portfolio. So, it leverages all the great bells🔔 and whistles like fast⚡️ performance, scalability⚖️, High availability and durability. As well as being a fully managed service that we have come accustomed too like handling hardware provisioning, software patching, backup, recovery, failure detection, and repair. Neptune🔱 is an optimized for storing billions of relationships and querying the graph with milliseconds latency.

Neptune🔱 uses database instances. The primary database instance supports both read📖 and write📝 operations and performs all the data modifications to the cluster. Neptune🔱 also uses replicas which connects to the same cloud-native☁️ storage service as the primary database instance but only supports read-only operations. There can be up to 15 of these replicas across multiple AZs. In addition, Neptune🔱 supports encryption at rest.

As prerequisite, you first must create and configure a virtual private cloud☁️ (VPC) to place Neptune🔱 in. You can leverage an existing one or you can create a dedicated one just for Neptune🔱 Next, you need to configure security🔒 groups for your VPC. Security🔒 groups are what controls who has access to your Neptune🔱. As for credentials🔐 and entitlements in Neptune🔱 is managed through AWS Identity and Access Management (IAM). Your data at rest in the Neptune🔱 is encrypted using the industry standard AES-256-bit encryption algorithm on the server that hosts your Neptune🔱 instance. Keys🔑 can also be used, which are managed through AWS Key🔑 Management Service (AWS KMS).

“Life moves pretty fast⚡️. If you don’t stop 🛑 and look 👀 around once in a while, you could miss it.”

So now feeling pretty good 😊 about NoSQL on AWS, where do we go now?

Well, as mentioned we have been learning so much over the last 5 months it could be very easy to forget somethings especially with limited storage capacity. So we decided to take a pause for the rest of the week and go back and review all that we have learned by re-reading all our previous blog posts as well as engaging in some Google Data Engineered solution Quests🛡to help reinforce our previous learnings

Currently, the fine folks at qwiklabs.com are offering anyone who wants to learn Google Cloud ☁️ skills an opportunity for unlimited access for 30 days. So with an offer too good to be true as well as an opportunity to add some flare to our linked in profile and who doesn’t like flare? We dove right in head first!

“Where do we go? Oh, where do we go now? Now, now, now, now, now, now, now”

Below are some topics I am considering for my travels next week:

OKTA SSO
Neo4J and Cypher
More with Google Cloud Path
ONTAP Cluster Fundamentals
Data Visualization Tools (i.e. Looker)
Additional ETL Solutions (Stitch, FiveTran)
Process and Transforming data/Explore data through ML (i.e. Databricks)

Thanks

—MCS

Week of July 17th

Posted on July 17, 2020 by Mark Shay

“Any time ⏰ of year… You can find it here”

Happy World🌎 Emoji 😊😜 Day!

The Last few weeks we have been enjoying our time in Microsoft’s Cloud☁️ Data Ecosystem and It was just last month that we found ourselves hanging out with the GCP☁️ gang and their awesome Data offerings. All seemed well and good😊 except that we had been missing out on excursion to the one cloud☁️ provider where it all began literally and figuratively.

Back when we first began our journey on a cold 🥶 and rainy☔️ day in March just as Covid-19🦠 occupied Wall Street 🏦 and the rest of the planet 🌎 we started at the one place that disrupted how infrastructure and operations would be implemented and supported going forward.

That’s right Amazon Web Services or more endearingly known to humanity as AWS. AWS was released just two decades ago by the its parent company that offers everything from A to Z.

AWS like its parent company has a similar mantra in the cloud ☁️ computing world as they offer 200+ Cloud☁️ Services. So how the heck with so some many months passed that we haven’t been back since? The question is quite perplexing? But like they say “all Clouds☁️☁️ lead to AWS. So, here we are back in the AWS groove 🎶 and eager 😆 to explore 🔦the wondrous world🌎 of AWS Data solutions. Guiding us through this vast journey would be Richard Seroter (who ironically recently joined the team at Google). In 2015, Richard authored an amazing Pluralsight course covering Amazon RDS🛢, Amazon DynamoDB 🧨 and Amazon’s Redshift 🎆. It was like getting 3 courses for the price of 1! 😊

Although the course was several years old, for the most part it still out lasted the test of time ⏰ by providing a strong foundational knowledge for Amazon’s relational, NoSQL, and Data warehousing solutions. But unfortunately in technology years, it’s kind of like dog🐕 years. So obviously, there have been many innovations to all three of these incredible solutions including UI enhancements, architectural improvements and additional features to these great AWS offerings making them even more awesome!

So for a grand finale to our marvelous week of learning and to help us fill in the gaps on some of these major enhancements as well as offering some additional insights were the team from AWS Training and certification which includes the talented fashionista Michelle Metzger, the always effervescent and insightful Blaine Sundrud and on demos the man with a quirky naming convention for database objects the always witty Stephen Cole

Back in our Amazon Web Services Databases in Depth course and in effort to make our journey that more captivating, Richard provided us with a nifty mobile sports 🏀 ⚾️ 🏈 app solution written in Node.js which leverages the Amazon data offerings covered in the course as components for an end to end solution. As the solution, was written several years back it did require some updating on some deprecated libraries📚 and some code changes in order to make the solution work which made our learning that more fulfilling. So, after a great introduction from Richard where he compares and contrasts RDS🛢, DynamoDB🧨, and Redshift🎆, we began our journey with Amazon’s Relational Database Service (RDS🛢). RDS🛢 is a database as a service (DBaaS) that makes provisioning, operating and scaling⚖️ either up or out seamless. In addition, RDS🛢 makes other time-consuming administrative tasks such as patching, and backups a thing of the past. Amazon RDS🛢 provides high availability and durability through the use of Multi-AZ deployments. In other words, AWS creates multiple instances of the databases in different Availability Zones making recovery from infrastructure failure automatic and almost transparent to the application. Of course like with all AWS offerings there always a heavy emphasis on security🔐 which it’s certainly reassuring when you putting your mission critical data in their hands 🤲 but it could also be a bit challenging at first to get things up and running when you are simply just trying connect to from your home computer 💻 back to the AWS infrastructure

As prerequisite, you first must create and configure a virtual private cloud☁️ (VPC) to put to your RDS🛢instance(s) in. You can leverage an existing one or you can create a dedicated one for RDS🛢instance(s).

It is required that your VPC have at least two subnets in order to support the Availability Zones for high availability. If direct internet access is needed that you will need to add an internet gateway to your VPC.

Next, you need to configure security🔒 groups for your VPC. Security🔒 groups are what controls who has access to the RDS🛢. RDS🛢 leverages three types of security groups (database, VPC, and EC2). As for credentials🔐 and entitlements in RDS🛢, it is managed through AWS Identity and Access Management (IAM). At the time of the release of Richard’s course, Amazon Aurora was new in the game and was not covered in depth in the course. In addition, at the same time only MySQL, Postgres, Oracle, MS SQL Server and the aforementioned Aurora were only supported at this time. AWS has since added support for MariaDB to their relational database service portfolio.

Fortunately, our friends from the AWS Training and Certification group gave us the insights that we would require on Amazon’s innovation behind their relational database built for the cloud☁️ better known as Aurora.

So, with six familiar database engines (licensing costs apply) to choose from you have quite a few options. Another key🔑 decision is to determines the resources you want your database to have. RDS🛢offers multiple options for optimized for memory, performance, or I/O.

I would be remised if we didn’t briefly touch on Amazon’s Aurora. As mentioned, it’s one of Amazon’s 6 database options with RDS🛢. Aurora is fully managed by RDS🛢. So it leverages the same great infrastructure and has all the same bells 🔔 and whistles. Aurora comes in two different flavors🍦 MySQL and PostgreSQL. Although, I didn’t benchmark Aurora performance in my evaluation AWS claims that Aurora is 5x faster than the standard MySQL databases. However, it’s probably more like 3x faster. But the bottom line it is that it is faster and more cost-effectiveness for MySQL or PostgreSQL databases that require optimal performances, availability, and reliability. The secret sauce behind Aurora is that automatically maintains 6 copies of your data (which can be increased up to 15 replicas) that is spanned across 3 AZs making data highly available and ultimately providing laser⚡️ fast performance for your database instances.

Please note: There is an option that allows a single Aurora database to span multiple AWS Regions 🌎 for an additional cost

In addition, Aurora uses an innovative and significantly faster log structured distributed storage layer than other RDS🛢offerings.

“Welcome my son, welcome to the machine🤖”

Next on our plate 🍽 was to take a deep dive into Amazon’s fast and flexible NoSQL database service a.k.a DynamoDB🧨.. DynamoDB🧨 like Cosmo DB🪐 is a multi-model NoSQL solution.

DynamoDB🧨 combines the best of those two ACID compliant non-relational databases in a key-value🔑 and document database. It is a proprietary engine, so can’t just take your MongoDB🍃 database and convert it to DynamoDB🧨. But don’t worry if you looking to move your MongoDB🍃 works loads to Amazon, AWS offers Amazon DocumentDB (with MongoDB compatibility) but that’s for a later discussion 😉

As for DynamoDB🧨, it delivers a blazing⚡️ single-digit millisecond guaranteed performance at any scale⚖️. It’s a fully managed, multi-Region, multi-master database with built-in security🔐, backup and restore options, and in-memory caching for internet-scale applications. DynamoDB🧨 automatically scales⚖️ up and down to adjust for the capacity and maintain performance of your systems. Availability and fault tolerance are built in, eliminating the need to architect your applications for these capabilities. An important concept to grasp while working with DynamoDB🧨 is that the databases are comprised of tables, items, and attributes. Again, there has been some major architectural design changes to DynamoDB🧨 since Richard’s course was released. Not to go into too many details as its kind or irrelevant but at time⏰ the course was released DynamoDB🧨 used to offer the option to either use a Hash Primary Key🔑 or Hash and Range Primary Key🔑 to organize or partition data and of course as you would imagine choosing the right combination was rather confusing. Intuitively, AWS scrapped this architectural design and the good folks at the AWS Training and Certification group were so kind to offer clarity here as well 😊

Today, DynamoDB🧨 uses partition keys🔑 to find each item in the database similar to Cosmo DB🪐. Data is distributed on physical storage nodes. DynamoDB🧨 uses the partition key🔑 to determine which of the nodes the item is located on. It’s very important to choice the right partition key 🔑 to avoid the dreaded hot 🔥partitions. Again “As rule of thumb 👍, an ideal Partition key🔑 should have a wide range of values, so your data is evenly spread across logical partitions. Also in DynamoDB🧨 items can have an optional sort key🔑 to store related attributes in a sorted order.

One major difference to Cosmos DB🪐 is that DynamoDB🧨 utilizes a primary key🔑 on each table. If there is no sort key🔑, the primary and partition keys🔑 are the same. If there is a sort key🔑, the primary key🔑 is a combination of the partition and sort key 🔑 which is called a composite primary key🔑 .

DynamoDB🧨 allows for secondary indexes for faster searches. It supports two types indexes local (up to 5 per table) and global (up to 20 per table). These indexes can help improve the application’s ability to access data quickly and efficiently.

Differences Between Global and Local Secondary Indexes

GSI	LSI
Hash or hash and range key	Hash and range key
No size limit	For each key, 10GB max
Add during table create, or later	Add during table create
Query all partitions in a table	Query single partition
Eventually consistent queries	Eventually/strong consistent queries
Dedicated throughput units	User table throughput units
Only access projected items	Access all attributes from table

DynamoDB🧨 like Cosmo DB🪐 offers multiple Data Consistency Levels. DynamoDB🧨 Offers both Eventually and Strongly consistent Reads but like I said previously “it’s like life itself there is always tradeoffs. So, depending on your application needs. You will need to determine what’s the most important need for your application latency or availability.”

As a prerequisite, you first must create and configure a virtual private cloud☁️ (VPC) to put DynamoDB🧨 in. You can leverage an existing one or you can create a dedicated one for DynamoDB🧨 Next, you need to configure security🔒 groups for your VPC. Security🔒 groups are what controls who has access to DynamoDB🧨. As for authentication🔐 and permission to access a table, it is managed through Identity and Access Management (IAM). DynamoDB🧨 provides end-to-end enterprise-grade encryption for data that is both in transit and at rest. All DynamoDB tables have encryption at rest enabled by default. This provides enhanced security by encrypting all your data using encryption keys🔑stored in the AWS Key🔑Management System, or AWS KMS.

“Quicker than a ray of light I’m flying”

Making our final destination for this week’s explorations would be to Amazon’s fully managed, fast, scalable data warehouse known as Redshift🎆 . A “Red Shift🎆” is when a wavelength of the light is stretched, so the light is seen as ‘shifted’ towards the red part of the spectrum but according to anonymous sources “RedShift🎆 was apparently named very deliberately as a nod to Oracle’ trademark red branding, and Salesforce is calling its effort to move onto a new database “Sayonara,”” Be that what it may, this would be the third Data Warehouse cloud☁️ solution we would have the pleasure of be aquatinted with. 😊

AWS claims Redshift🎆 delivers 10x faster performance than other data warehouses. We didn’t have a chance to benchmark RedShift’s 🎆 performance but based some TPC tests vs some of their top competitors there might be some discrepancies with these claims but either case it’s still pretty darn on fast.

Redshift🎆 uses Massively parallel processing (MPP) and columnar storage architecture. The core unit that makes up Redshift🎆 is the cluster. The Cluster is made up of one or more compute nodes. There is a single leader node and several compute nodes. Clients access to Redshift🎆 is via a SQL endpoint on the leader node. The client sends a query to the endpoint. The leader node creates jobs based on the query logic and sends them in parallel to the compute nodes. The compute nodes contain the actual data the queries need. The compute nodes find the required data, perform operations, and return results to the leader node. The leader node then aggregates the results from all of the compute nodes and sends a report back to the client.

The compute nodes themselves are individual servers, they have their own dedicated memory, CPU, and attached disks. An individual compute node is actually split up into slices🍕, one slice🍕 for every core of that node’s processor. Each slice🍕 is given a piece of memory, disk, and so forth, where it processes whatever part of the workflow that’s been assigned to it by the leader node.

The way the columnar database storage works data is stored by columns rather than by rows. This allows for fast retrieval of columns of data. An additional advantage is that, since each block holds the same type of data, block data can use a compression scheme selected specifically for the column data type, further reducing disk space and I/O. Again, there have been several architectural changes to RedShift🎆 as well since Richard’s course was released.

In the past you needed to pick a distribution style. Today, you still have the option to choose a distribution style but if don’t specify a distribution style, Redshift🎆 will uses AUTO distribution making it little easier not to make the wrong choice 😊. Another recent innovation to Redshift🎆 that didn’t exist when the Richard’s course was released is the ability to build a unified data platform. Amazon Redshift🎆 Spectrum allows you to run queries across your data warehouse and Amazon S3 buckets simultaneously. Allowing you to save time ⏰ and money💰 as you don’t need to load all your data into the data warehouse.

As prerequisite, you first must create and configure a virtual private cloud☁️ (VPC) to place Redshift🎆 in. You can leverage an existing one or you can create a dedicated one just for Redshift🎆. In addition, you will need to create an Amazon Simple Storage Service (S3) bucket and S3 Endpoint to be used with Redshift🎆. Next, you need to configure security🔒 groups for your VPC. Security🔒 groups are what controls who has access to your data warehouse. As for credentials 🔐 and entitlements in Redshift🎆, it is managed through AWS Identity and Access Management (IAM).

One last point worth mentioning is that AWS Cloud ☁️ Watch ⌚️ is included with all the tremendous Cloud☁️ Services offered by AWS. So you get great monitoring 📉right of the box! 😊 We enjoyed 😊 our time⏰ this week in AWS exploring 🔦 some of their data offerings, but we merely just scratched the service.

“So much to do, so much to see… So, what’s wrong with taking the backstreets? You’ll never know if you don’t go. You’ll never shine if you don’t glow“

Below are some topics I am considering for my travels next week:

More with AWS Data Solutions
OKTA SSO
Neo4J and Cypher
More with Google Cloud Path
ONTAP Cluster Fundamentals
Data Visualization Tools (i.e. Looker)
Additional ETL Solutions (Stitch, FiveTran)
Process and Transforming data/Explore data through ML (i.e. Databricks)

Thanks

—MCS