neptune | SQL Squirrels

Week of July 24th

Posted on July 24, 2020 by Mark Shay

“And you may ask yourself 🤔, well, how did I get here?” 😲

Happy Opening⚾️ Day!

Last week, we hit a milestone of sorts with our 20th submission🎖since we started our journey way back in March.😊 To commemorate the occasion, we made a return back to AWS by gleefully 😊 exploring their data ecosystem. Of course, trying to cover all the data services that are made available in AWS in such a short duration 🕰 would be a daunting task.

So last week, we concentrated our travels to three of their main offerings in the Relational Database, NoSQL, and Data warehouse realms. This being of course RDS🛢, DynamoDB🧨, and Redshift🎆. We felt rather content 🤗 and enlighten💡with AWS’s Relational Database and Data warehouse offerings, but we were still feeling a little less satisfied 🤔 with NoSQL as we really just scratched the surface on what AWS had to offer.

To recap, we had explored 🔦 DynamoDB🧨 AWS’s multi-model NoSQL service which offers support for a key-value🔑and their propriety document📜 database. But we were still curious to learn more about a Document📜 database that offers MongoDB🍃support as well in AWS. In addition, an answer to the hottest🔥 trend in “Not Just SQL Solutions”, Graph📈 Database.

Well of course being the Cloud☁️ Provider that offers every Cloud☁️native service from A-Z, AWS delivered with many great options. So we began our voyage heading straight over to DocumentDB📜. AWS’s fully managed database service with MongoDB🍃compatibility. As with all AWS services, Document DB📜 was designed from the ground up to give the most optimal performance, scalability⚖️, and availability. DocumentDB📜 like the Cosmo DB🪐 MongoDB🍃API makes it easy to set up, operate, and scale MongoDB-compatible databases. In other words, no code changes are required, and all the same drivers can be utilized by existing legacy MongoDB🍃applications.

In addition, Document DB📜 solves the friction and complications of when an application tries to map JSON to a relational model. DocumentDB📜 solves this problem by making JSON documents a first-class object of the database. Data is stored in the form of documents📜. These documents📜 are stored into collections. Each document📜can have a unique combination and nesting of fields or key-value🔑 pairs. Thus, making querying the database faster⚡️, indexing more flexible, and repetitions easier.

Similar to other AWS Data offerings, the core unit that makes up DocumentDB📜 is the cluster. A cluster consists of one or more instances and cluster storage volume that manages the data for those instances. All writes📝 are done through the primary instance. All instances (primary and replicas) support read 📖 operations. The cluster’s data stores six copies of your data across three different Availability Zones. AWS easily allows you to create or modify clusters. When you modify a cluster, AWS is really just spinning up a new cluster behind the curtains and then migrates the data taking what is an otherwise complex task and making it seamless.

As prerequisite, you first must create and configure a virtual private cloud☁️ (VPC) to place DocumentDB📜 in. You can leverage an existing one or you can create a dedicated one just for DocumentDB📜. Next, you need to configure security🔒 groups for your VPC. Security🔒 groups are what controls who has access to your Document📜 Databases . As for credentials🔐 and entitlements in DocumentDB📜, it is managed through AWS Identity and Access Management (IAM).By default, the cluster Document DB📜accepts secure connections using Transport Layer Security (TLS). So, all traffic in transit is encrypted and Amazon DocumentDB📜 uses the 256-bit Advanced Encryption Standard (AES-256) to encrypt your data or allows you to encrypt your clusters using keys🔑 you manage through AWS Key🔑Management Service (AWS KMS) so data at rest is always encrypted.

“Such wonderful things surround you…What more is you lookin’ for?”

Lately, we have been really digging Graph📈 Databases. We had our first visit with Graph📈 Databases when we were exposed to the Graph📈 API through Cosmos DB🪐 earlier this month and then furthered our curiosity through Neo4J. Well, now armed with a plethora of knowledge in the Graph📈 Database space we wanted to see what AWS had to offer and once again they did not disappoint.😊

First let me start by writing, It’s a little difficult to compare AWS Neptune🔱 to Neo4J although Nous Populi from Leapgraph does an admirable job. Obviously, both are graph📈 databases but architecturally there some major differences in their graph storage model and query languages. Neo4J uses Cypher and Neptune🔱 uses Apache TinkerPop or Gremlin👹 same as Cosmos DB🪐 as well as SPARQL. Where Neptune🔱 really shines☀️ is that it’s not just another graph database but a great service offering within the AWS portfolio. So, it leverages all the great bells🔔 and whistles like fast⚡️ performance, scalability⚖️, High availability and durability. As well as being a fully managed service that we have come accustomed too like handling hardware provisioning, software patching, backup, recovery, failure detection, and repair. Neptune🔱 is an optimized for storing billions of relationships and querying the graph with milliseconds latency.

Neptune🔱 uses database instances. The primary database instance supports both read📖 and write📝 operations and performs all the data modifications to the cluster. Neptune🔱 also uses replicas which connects to the same cloud-native☁️ storage service as the primary database instance but only supports read-only operations. There can be up to 15 of these replicas across multiple AZs. In addition, Neptune🔱 supports encryption at rest.

As prerequisite, you first must create and configure a virtual private cloud☁️ (VPC) to place Neptune🔱 in. You can leverage an existing one or you can create a dedicated one just for Neptune🔱 Next, you need to configure security🔒 groups for your VPC. Security🔒 groups are what controls who has access to your Neptune🔱. As for credentials🔐 and entitlements in Neptune🔱 is managed through AWS Identity and Access Management (IAM). Your data at rest in the Neptune🔱 is encrypted using the industry standard AES-256-bit encryption algorithm on the server that hosts your Neptune🔱 instance. Keys🔑 can also be used, which are managed through AWS Key🔑 Management Service (AWS KMS).

“Life moves pretty fast⚡️. If you don’t stop 🛑 and look 👀 around once in a while, you could miss it.”

So now feeling pretty good 😊 about NoSQL on AWS, where do we go now?

Well, as mentioned we have been learning so much over the last 5 months it could be very easy to forget somethings especially with limited storage capacity. So we decided to take a pause for the rest of the week and go back and review all that we have learned by re-reading all our previous blog posts as well as engaging in some Google Data Engineered solution Quests🛡to help reinforce our previous learnings

Currently, the fine folks at qwiklabs.com are offering anyone who wants to learn Google Cloud ☁️ skills an opportunity for unlimited access for 30 days. So with an offer too good to be true as well as an opportunity to add some flare to our linked in profile and who doesn’t like flare? We dove right in head first!

“Where do we go? Oh, where do we go now? Now, now, now, now, now, now, now”

Below are some topics I am considering for my travels next week:

OKTA SSO
Neo4J and Cypher
More with Google Cloud Path
ONTAP Cluster Fundamentals
Data Visualization Tools (i.e. Looker)
Additional ETL Solutions (Stitch, FiveTran)
Process and Transforming data/Explore data through ML (i.e. Databricks)

Thanks

—MCS