sqlserver | SQL Squirrels

Week of September 25th

Posted on September 25, 2020 by Mark Shay

“Dynamite🧨 with a laser beam💥…Guaranteed to blow💨 your mind🧠”

Happy National Lobster 🦞 Day!

“And here I go again on my own… Goin’ down the only road I’ve ever known”

This week we continued where we last left off the previous week as we continued exploring the depths of SQL Server 2019. Last week, we just merely scratched 💅 the surface of SQL Server 2019 as we dove🤿 into IQP and the improvements made to TempDB. This week we tackled Microsoft’s most ambitious SQL Server offering to date in SQL Server 2019 Big Data Clusters (BDC). When I first thought of BDCs the first thing that came to mind 🤔 was a Twix Bar 🍫. Yes, we all know Twix is the is the “only candy with the cookie crunch” but what makes the Twix bar so delicious 😋 is the perfect melding of smooth Chocolate, Chewy Carmel and of course crisp cookie🍪! Well, that’s exactly what Big Data Cluster is like… You’re probably thinking Huh?

Big Data Clusters (BDC) is MSFT’s groundbreaking new Big Data/Data Lake architecture that unifies virtual business data and operational data stored in relational databases with IoT for true real-time BI and embedded Artificial Intelligence (AI) and Machine Learning (ML). BDC combines the power⚡️of SQL Server, Spark 🔥, and the Hadoop Distributed File System (HDFS) 🐘 into a unified data platform. But that’s not all! Since BDC runs natively on Linux🐧 it’s able to embrace modern architectures for deploying applications like Linux-based Docker🐧- 🐳 containers on Kubernetes ☸︎.

By Leveraging K8s ☸︎ for orchestration, deployments of BDCs are predictable, fast 🏃🏻, elastic🧘‍♀️ and scalable ⚖️. Seeing that Big data clusters can run any Kubernetes ☸︎ environment whether it be on-premise 🏠 (i.e. Red Hat OpenShift) or in the cloud☁️ (i.e. Amazon EKS); BDC makes a perfect fit to be hosted on Azure Kubernetes Service (AKS).

Another great feature that BDC makes use of is data virtualization, also known as “Polybase“. Polybase made its original debut with SQL Server 2016. It had seemed like Microsoft had gone to sleep😴 on it but now with SQL Server 2019 BDC, Microsoft has broken out big time! BDC takes advantage of Data Virtualization as it’s “data hub”. So, you don’t need to spend time 🕰 and expense💰 of traditional extract, transform, and load (ETL) 🚜 and Data Pipelines. In addition, it lets organizations leverage existing SQL Server expertise and extract value from third-party data sources such as NoSQL, Oracle, Teradata and HDFS🐘.

Lastly, BDC takes advantage of Azure Data Studio (ADS) for both deployments and administration of BDC. For those who are not familiar with ADS, it’s a really cool 😎 tool 🛠 that you can benefit from by acquainting yourself with. Of course, SSMS isn’t going anywhere but with ADS you get a cross-platform database lightweight tool 🛠 that uses Jupyter notebooks📒 and Python🐍 scripting making deployments and administration of BDCs a breeze. OK, I am ready to rock and roll 🎶!

“I love❤️ rock n’ roll🎸… So put another dime in the jukebox 📻, baby”

Before we can jump right into the deep end of the Pool 🏊‍♂️ with Big Data Cluster, we felt a need for a little primer on Virtualization, Kubernetes ☸︎, and Containers. Fortunately, we knew just who to deliver the perfect overview, no other then one of the prominent members of the Mount Rushmore of SQL Server, Buck Woody who through his guest appearance with Long time product team member and Technology Evangelist Sanjay Soni in the Introduction to Big Data Cluster on SQL Server 2019 | Virtualization, Kubernetes ☸︎, and Containers YouTube Video. Buck who has a very unique style and an amazing skill of taking complex technologies and making them seem simple. His supermarket 🛒 analogy to explain Big Data, Hadoop and Spark 🔥, virtualization, containers, and Kubernetes ☸︎ is pure brilliance! Now, armed 🛡🗡 with Buck’s knowledge bombs 💣 we were ready to light 🔥 this candle 🕯!

“Let’s get it started (ha), let’s get it started in here”

Taking us through BDC architecture and deployments was newly minted Microsoft Data Platform MVP Mohammad Darab who put together a series of super exciting videos as well as excellent detailed blog posts on BDC and Long Time SQL Server veteran and Microsoft Data Platform MVP Ben Weissman through his awesome Building Your First Microsoft SQL Server Big Data Cluster Pluralsight course. Ben’s speculator course covers not only architecture and deployment but how to get data in and out of the, make the most of out of them and how to monitor, maintain, and troubleshoot BDC.

“If you have built castles 🏰 in the air, your work need not be lost; that is where they should be. Now put the foundations 🧱 under them.” – Henry David Thoreau

A big data cluster consists of several major components:

Controller (Control plane)
- Master Instance – manage connectivity (endpoints and communication with the other Pools 🏊‍♂️), scale-out ⚖️ queries, metadata and user databases (target for Restore databases), and machine learning services.

Data not residing in your master instance will be exposed through the concept of External tables. Examples are CSV files on an HDFS store or Data on another RDBMS. External tables can be queried the same as local tables on the SQL Server with several cavorts:

Unable modify the table structure or content
No Indexes can be applied (only statistics are kept. SQL Server to houses that meta data.
Data source might require you to provide credentials 🔐.
Data source may also need a format definition like text qualifiers or separators for CSV files 🗃

Compute Pool 🏊‍♂️
Storage Pool 🏊‍♂️
Data Pool 🏊‍♂️
Application Pool 🏊‍♂️

The controller provides management and security for the cluster and acts as the control plane for the cluster. It takes care of all the interactions with K8s ☸︎, the SQL Server instances that are part of the cluster and other components like HDFS🐘, Spark🔥, Kibana, Grafana, and Elastic Search

The controller manages:

Cluster lifecycle, bootstrap, delete update, etc.
Master SQL Server Instance
Compute, data, and Storage Pools 🏊‍♂️
Cluster Security 🔐

The Compute Pool 🏊‍♂️ is a set a stateless multiple SQL Server 2019 instances that work together. The Compute Pool 🏊‍♂️ leverage Data Virtualization or Polybase to scale-out⚖️ queries across partitions. The Compute Pool 🏊‍♂️ is automatically provisioned as part of BDC. Management and Patching of Computer Pools 🏊‍♂️ is easy because they run on Docker 🐳 containers running on K8☸️ pods.

Please note: Queries on BDC can also function without the Compute Pool 🏊‍♂️.

Compute Pool 🏊‍♂️ is responsible for:

Joining of two or more directories 📂 in HDFS🐘 with 100+ files
Joining of two or more data sources
Joining multiple tables with different partitioning or distribution schemes
Data stored in Blob Storage

The Storage Pool 🏊‍♂️ consists of pods comprised of SQL Servers on Linux🐧 and Spark🔥 on HDFS 🐘 (deployed automatically). All the Nodes of the BDC are members of an HDFS🐘 Cluster. The Storage Pool 🏊‍♂️ stores file‑based 🗂 data like a CSV and queried directly through external tables and T‑SQL, or you can use Python🐍 and Spark🔥. If you already have an HDFS🐘 on either Azure Data Lake (ADLS) store or AWS S3 buckets🗑 you can easily mount your existing storage into your BDC without the need to shift all your data around.

The Storage Pool 🏊‍♂️ is responsible for:

Data ingestion through Spark🔥
Data Storage in HDFS🐘 (Parquet format). HDFS🐘 data is spread across all storage nodes in the BDC for persistency
Data access through HDFS🐘 and SQL Server Endpoints

The Data Pool 🏊‍♂️ is used for data persistence and caching. Under the covers the Data Pool 🏊‍♂️ is a set of SQL Servers (Defined at Deployment) that are using Columnstore Index and Sharding. In other words, SQL Server will create physical tables with same structure and evenly distribute the data across the total number of servers. The Queries will be also be distributed across server but to the user it will be transparent as all the magic✨🎩 is happening behind the scenes. The data stored in the data Pool 🏊‍♂️ does not support transactions as its sole purpose is for caching. It is used to ingest data from SQL Queries or Spark🔥 Jobs. BDC data marts are persisted in the data Pool 🏊‍♂️.

Data Pool 🏊‍♂️ is used for:

Complex Query Joins
Machine Learning
Reporting

The application Pool 🏊‍♂️ is used to run jobs like SSIS, store and execute ML models, and all kinds of other applications which are generally exposed through a web service.

“This is 10% luck🍀… 20% skill… 15% concentrated power🔌 of will… 5% pleasure😆… 50% pain🤕… And a 100% reason to remember the name”

After both great overviews of the BDC architecture by both Mo and Ben, we were eager to build this spaceship 🚀. First, we need to download the required tools 🛠🧰 so we can happily😊 have our base machine where we can deploy our first BDC. I have chosen the Mac 💻 just to spice 🥵 things up as my base machine.

Below are the Required Tools:

Azure Data Studio (ADS)
ADS Extension for Data Virtualization
Python 🐍
Kubectl ☸️
azdata
Azure-CLI (and only required if using AKS)

Optional Tools:

Text Editor 🗒
PowerShell Extension for ADS
Zip utility Software 🗜
SQL Server Utilities
SSH

Now that we got all our prerequisites downloaded, we next needed to determine where we should deploy our BDC. The most natural choice seemed to be with AKS.

To help walk🚶‍♂️us through the installation of our base machine and the deployment of BDS using ADS on AKS, we once again turned to Mo Darab who put together an excellent and easy to follow along series of videos Deploying Big Data Clusters on his YouTube channel.

Base Machine

In his video “How to set Up a Base Machine”, Mo used a Windows machine opposed to us who went with the Mac 💻 . But for all intents and purposes the steps are pretty much the same. The only difference is the package📦 manager that we need to use. Windows the recommended package manager is Chocolatey 🍫 while on the Mac 💻 its Brew 🍺.

Here were the basic steps:

Install ADS for MAC
Install Brew
1. /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install.s
Install Kubectl and Azure-CLI

§  brew install kubernetes-cli

§  brew update && brew install azure-cli

Install Python
- brew install python3
Install azdata
- brew tap microsoft/azdata-cli-release
- brew update
- brew install azdata-cli
Install ADS Extension for Data Virtualization
Install Pandas (manage package option in ADS/ Add New Pandas/ Click Install)

“Countdown commencing, fire one”

Now, we had our base machine up and running 🏃‍♂️, it was time ⏱ to deploy. To guide us through the deployment process, we once again went to Mo and followed along in his How to Deploy Big Data Cluster on AKS using Azure Data Studio. Mo walked us through the wizard 🧙‍♂️ in ADS which basically creates a Notebook 📒 that builds our BDC. We created our deployment Jupyter notebook 📒 and then clicked Run 🏃‍♂️ and let it rip☄️. Everything seemed to be humming 🎶 along except 2 hours later our Jupyter notebook was still running 🏃‍♂️.

Obviously, something wasn’t right? 🤔 Unfortunately, we didn’t’ have much visibility on what’s happening with the install through the notebook but hey that’s ok we have a terminal prompt in ADS. So we can just run some K8☸️ commands to see what’s happening under the covers. Ok, so after running a few K8 commands we noticed “ImagePullBackOff” error with our SQL Server Images. After a little bit research, we determined someone forgot 🙄to update the Microsoft Repo with the latest CU Image.

“But there is no joy in Mudville—mighty Casey has struck out.”

So we filled a bug 🐞 on GitHub and we ran the BDC wizard 🧙‍♂️ changed the Docker🐳 settings pointing to the next latest package available on the Docker🐳 Registry and we were back in business until… And then Bam!🥊🥊

Operation failed with status: 'Bad Request'. Details: Provisioning of resource(s) for container service mssql-20200923030840 in resource group mssql-20200923030840 failed. Message: Operation could not be completed as it results in exceeding approved Total Regional Cores quota.

What shall we do? What does our Azure Fundamentals training tells us to do? That’s right we go to Azure Portal and submit a support ticket and beg the good people at Microsoft to increase our Cores Quota. So, we did that and almost instantaneously MSFT quick obliged. 😊

Great, we are back in business (Well sort of).. After several more attempts we ran into more Quota issues with all three types (SKU, Static, and Basic) of Public IP Addresses . So, 3 more support tickets later and there was finally joy 😊 to the world🌎.

Next, we turned back to Ben who provided some great demos in his Pluralsight course on how to set up Data Virtualization for both a SQL Server and HDFS 🐘 files as a data sources on BDC

Data Virtualization (SQL Server)

Right‑mouse click ->Virtualize Data
Data virtualization wizard 🧙‍♂️ will launch.
Next step chooses data source

Create Master Key

Create connection and specify username and password

Next, the wizard🧙‍♂️ will connect to the source and a list of the tables and views
Choose the script option

Click on a specific table(s)

The external table inherits a schema from its source, and transformation would happen in your queries.

You can choose between just having them created or to generate a script.

CREATE EXTERNAL TABLE [virt].[PersonPhone]

(

[BusinessEntityID] INT NOT NULL,

[PhoneNumber] NVARCHAR(25) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,

[PhoneNumberTypeID] INT NOT NULL,

[ModifiedDate] SMALLDATETIME NOT NULL

)

WITH (LOCATION = N'[AdventureWorks2015].[Person].[PersonPhone]’, DATA_SOURCE = [SQLServer]);

Data Virtualization (HDFS🐘)

Right‑mouse‑click your HDFS 🐘 and create a new directory
Upload the flight delay dataset to it.
Expand the directory 📁 to see files 🗂

Right‑mouse‑click a file launches a wizard 🧙‍♂️ to virtualize data from CSV files

The wizard 🧙‍♂️will ask you for the database in which you want to create the external table, a name for the data source;
Next step preview of our data,
next step, the wizard 🧙‍♂️ recommends a column type

CREATE EXTERNAL DATA SOURCE [SqlStoragePool]

WITH (LOCATION = N’sqlhdfs://controller-svc/default’);

CREATE EXTERNAL FILE FORMAT [CSV]

WITH (FORMAT_TYPE = DELIMITEDTEXT, FORMAT_OPTIONS (FIELD_TERMINATOR = N’,’, STRING_DELIMITER = N'”‘, FIRST_ROW = 2));

CREATE EXTERNAL TABLE [csv].[airlines]

(

[IATA_CODE] nvarchar(50) NOT NULL,

[AIRLINE] nvarchar(50) NOT NULL

)

WITH (LOCATION = N’/FlightDelays/airlines.csv’, DATA_SOURCE = [SqlStoragePool], FILE_FORMAT = [CSV]);

Monitoring 🎛 Big Data Clusters through Azure Data Studio

BDC comes with a pre-deployed Grafana container for SQL Server and system metrics. This map server collects all those metrics from every single node, container, and pod and provides them individual dashboards 📊.

The Kibana dashboard 📊 is part of the Elastic Stack and provides a looking glass🔎 into all of your log files in BDC.

Troubleshooting Big Data Clusters through Azure Data Studio

In ADS, on the main dashboard 📊 there is a button Troubleshoot. This provides a library📚 of notebooks 📒 that you can use to troubleshoot and analyze the cluster. Notebooks 📒 are categorized and provide all kinds of different aspects on monitoring 🎛 of a diagnosing to help you repair 🛠 an issue within your cluster.

In addition, the azdata utility can be monitoring 🎛, running queries and notebooks 📒, and retrieve a cluster’s endpoints. the namespace and username as well.

We really enjoyed spending time learning SQL Server 2019 Big Data Cluster. 😊

“And I feel like it’s all been done Somebody’s tryin’ to make me stay You know I’ve got to be movin’ on”✈️

Below are some of the destinations I am considering for my travels for next week:

Google Cloud Certified Associate Cloud Engineer Path

Thanks –

–MCS

Week of September 18th

Posted on September 18, 2020 by Mark Shay

“Catch the mist, catch the myth…Catch the mystery, catch the drift”

L’shanah tovah! 🎉

So after basking in last week’s accomplishment of passing the AZ-900: Microsoft Azure Fundamentals certification exam 📝 this week we decided that we need come down from the clouds☁️☁️. Now, grounded we knew exactly where we needed to go. For those who know me well, I have spent a large part of my career supporting and Engineering solutions for Microsoft SQL Server. Microsoft released SQL Server 2019 back in November of the same year.

For the most part, we really haven’t spent too much on our journey going deep into SQL Server. So it has been long overdue that we give SQL Server 2019 a looksy 👀 and besides we needed to put the S-Q-L back in SQL Squirrels 🐿

SQL Server 2019 had been Microsoft’s most ambitious release of the product that has been around now for just a little over 25 years. 🎂 The great folks at MSFT built on previous innovations in the product to further enhance the development of languages, data types, on-premises or cloud ☁️ environments, and operating systems. Just a few of the big headline features of SQL Server 2019 are:

Intelligent Query Processing (IQP)
Memory-optimized TempDB
Accelerated Database Recovery (ADR)
Data Virtualization with PolyBase
Big Data Clusters

So let’s un-peel the onion 🧅 and begin diving into SQL Server 2019. So, where to begin? Full disclosure, I am kind of a bit of SQL Server 2019 internals geek 🤓. So we started with a tour of some of the offerings in Intelligent Query Processing

The IQP feature in SQL Server 2019 provides a broad impact that improves the performance💥 of existing workloads with minimal implementation effort to adopt. In theory, you just simply change the database compatibility level of your existing user databases when you upgrade to 2019 and your query just works faster 🏃 with no code changes! Of course this sounds 📢 almost too good to be true?

“Here we are now, entertain us”

To take us through an unbiased lens 👓 at IQP is no other than the man, the myth, the legend, Microsoft Certified Master Brent Ozar. Brent produced two amazing presentations for the SQL Server community, What’s New in SQL Server 2019 and The New Robots in SQL Server 2017, 2019 and Azure.

Brent with his always quirky, galvanic, and sometimes cynical style gives us the skinny on the Good 😊, the Bad 😞, and the Ugly😜 on IQP.

“I never played by the rules, I never really cared…My nasty reputation takes me everywhere“

First we began, looking at Table Variables. For those who have been around the SQL Server block 🧱, table variables have been around since the good o’l days of SQL 2000 as a performance 💥 improvement alternative to using temporary tables.

However, SQL Server doesn’t maintain statistics on table variables which as you probably know the query optimizer uses to determine the best execution plan. Basically, the SQL Server query optimizer just assumes that a table variable(s) only has one row whether it does or doesn’t. So performance wasn’t so great to say the least. Thus, Table Variables don’t have the best reputation.

Well, SQL Server 2019 to rescue! (Sort of..) In SQL Server 2019, by simply switching the Database “compat mode” to SQL Server 2019 (15.x) SQL Server, SQL Server will now use the actual number of rows in the table variable to create the plan. This is just fantastic! However, as Brent diligently points out this nifty little enhancement isn’t all Rainbows 🌈 and Unicorns 🦄 . Although it does solve the pesky estimated row problem with the query optimizer has with table variables, but now introduces the enigma that is parameter sniffing.

Parameter sniffing has plagued SQL Server since the beginning of time 🕰 and is one of the more common performance💥 issues we SQL people encounter. In simplest terms, SQL Server tries to optimize the execution plan by leveraging a previously ran🏃 plan used. The problem lies when there is different results set due to the parameters provided. Overall, this feature is definite improvement but not the silver bullet 🚀.

“Memories may be beautiful and yet…”

Next, we took a look 👀 @ the Memory Grant Feedback (row mode) feature which originally made its debut in SQL Server 2017. A memory grant is used by the SQL Server database engine to allocate how much memory it will use for a given query.

Sometimes it can allocate too much or too little than the actual memory needed. Obviously, if too much memory is allocated than there will be less memory available for other processes, and if too little is allocated than memory will spill 💦 to disk and performance will suffer.

“Here I come to save the day!” 🐭

Once again, by simply switching the Database “compat mode” to SQL Server 2019 (15.x), SQL Server will now automatically adjust the amount of memory granted for a query. Awesome! … but what’s the catch? Well, similar to the new Table variable enhancement situation, memory grants are sensitive to the dreaded parameter sniffing as SQL Server will base its decision making on the previous run query. In addition, If you rebuild your indexes or create new indexes, the adaptive memory grants feature will forget everything it learned about previously run plans and start all over. 😞 This is an improvement to the past performance of memory grants but unfortunately not the panacea.

“I will choose a path that’s clear.. I will choose free will”

Further, we next explored another feature that first made an appearance in SQL Server 2017 in Adaptive Joins. Back then this sleek feature was only available for Columnstore index but now in SQL Server 2019 is available for our run of the mill 🏭 b-tree 🌲 or row-type work loads. In SQL Server, with Adaptive Joins the query optimizer will dynamically determines at runtime the threshold number of rows and then it chooses between a Nested Loop or Hash Match join operator. Again, we simply switch the Database “compat mode” to SQL Server 2019 (15.x) and abracadabra!

This is awesome! But… Well, once again parameter sniffing rears its ugly 😜 head. As Brent astutely points out during his presentation and documents in his detailed blog post when dealing with complex queries often times SQL Server will produce several plans. So when SQL Server 2019 tries to solve this problem with more options for the optimizer to choose from it will in some particular cases might backfire 🔥 depending on your workloads.

“Just when I thought our chance had passed..You go and save the best for last”

The last feature we keyed 🔑 on as part of IQP was Batch Mode execution. This is just another great innovation that came straight out Columnstore Index (SQL 2012). Now offered in SQL Server 2019, users can take advantage of this enhancement without needing to use a Columnstore Index.

Batch mode is a huge performance 💥 enhancement especially with CPU-bound queries and for queries that use aggregation, sorts, and group by operations. Batch mode performs scans and calculations using batches. To enable the batch mode processing all you have to do is… thats right.. you guessed it… Switching the Database “compat mode” to SQL Server 2019 (15.x)

“Don’t cry Don’t raise your eye It’s only teenage wasteland”

Continuing our journey with SQL Server 2019.. We ventured to take a sneak peak 👀 at the improvements SQL Server 2019 made to TempDB. Guiding us through the murky terrain of TempDB was SQL Royalty👸 and Microsoft Certified Master Pam Lahoud. Pam presented in two short but very thorough videos as part of the Data Exposed series:

In these great videos, Pam gives us the lowdown on improvements to TempDB in SQL Server 2019. TempDB is of course one of the famous or infamous system databases as part of SQL Server. As SQL Server, has been enhanced over the last several decades more and more “stuff” has been chucked into TempDB. TempDB has often been referred to as the “Wastelands”. Some of the activities that go on in TempDB are:

Temp Tables
Table Variables
Cursors
DBCC CHECKDB
Table-Valued Functions
Row Versions
Online Index Operations
Sorts
Triggers
Statistics Updates
Hash Worktables
Spools

As result of all these happenings in SQL Server, we often might experience some pain 😭 in Object allocation contention, Metadata contention, and Temp table cache contention.

“Look up in the sky! It’s a Bird 🕊 … It’s a Plane… It’s ..“

..SQL Server 2019. (Of course 😊)

In SQL 2019, there are two major improvements that impact TempDB performance. The first improvement made helps reduce some of the Temp table cache contention issues. SQL 2019 intelligently partitions cache objects and optimizing cache look up.

To address object allocation contention issues SQL 2019 now offers concurrent PFS updates. In addition, by enabling concurrent PFS updates this allows us to have multiple threads share a latch on the PFS page and therefore we can have more more concurrent threads and less attention on those object allocation pages

The Next major enhancement is memory optimized metadata tables. This is the big one that got into the brochure!

Under the hood, SQL Server 2019 moves the system objects into memory optimized tables that have latched free, lock free structures which greatly increases the concurrency that we can have against those metadata tables and that helps us alleviate that metadata contention.

By Default the Memory-Optimized TempDB Metadata feature is not turned out but its quite easy to enable:

ALTER SERVER CONFIGURATION SET MEMORY_OPTIMIZED TEMPDB_METADATA = ON;

For those who see a flurry of tempdb activity, this feature seems like a no brainer 🧠.

“Makes us want to stay, stay, stay… For awhile”

Below are some of the places I am considering for my explorations for next week:

Continuing with SQL Server 2019
Google Cloud Certified Associate Cloud Engineer Path

Thanks –

–MCS

Week of August 21st

Posted on August 21, 2020 by Mark Shay

Pulling mussels from a shell

Happy Poet’s 📜 Day!

“I’ve been too long, I’m glad to be back”

So, after a 2-week hiatus we are glad to be back and all powered⚡️up and ready to get back on our continuous journey ✈️in learning. In our most recent learning, we have spent time🕰️ in the clouds ☁️☁️ (in particular AWS) so we thought we might circle⭕ back to an oldie but goodie with PowerShell⚡️🐚. Why PowerShell⚡️🐚 you might ask? That’s seems so yesterday. However, it’s still very much still today. Microsoft has been very much committed to PowerShell⚡️🐚 especially in regards to managing Windows. Every application from Exchange to SQL to Active Directory is built with PowerShell⚡️🐚 as a basis for administration. In addition, Cloud ☁️ solutions such as Office 365 and Azure integrate with PowerShell⚡️🐚.

When we first started with PowerShell⚡️🐚 around circa ~2012 with version 3 and then of course with version 4 which debuted just a year later it was all the rage. Just a quick a review for those not familiar with PowerShell⚡️🐚. PowerShell⚡️🐚 is a management framework that combines a command-line shell 🐚 and scripting language allowing IT professionals🧑‍💻the ability to perform real‑world🌎 tasks such as gathering remote computer 🖥️ information, automate repetitive tasks, and troubleshooting system problems. Initially, PowerShell⚡️🐚 was built for Windows only running🏃‍♂️ on top of the Windows .NET Framework but in 2016 Microsoft made the strategic decision and pivoted to offer PowerShell⚡️🐚 as Open Source through PowerShell Core running🏃‍♂️ on top of course .NET Core which is an open source framework that runs 🏃 on Linux🐧, macOS🍎 and Windows .

So where shall we begin? In the past, we have found when reviewing a familiar technology solution that the best place to start is with the basics and we found the perfect course on Pluralsight through Michael Bender’s PowerShell: Getting Started. Michael’s excellent course provides a strong fundamental knowledge of PowerShell⚡️🐚 with a personal guarantee that upon completion of the course you will be able to “hit the ground running 🏃 with usable PowerShell⚡️🐚 ninja🤺 skills.”

In the course introduction, Michael succinctly discusses why PowerShell⚡️🐚 is more than a scripting language and a command line interface but an execution engine that provides the ability for you to interface with your environment using a variety of tools 🔧. First, he touches on the traditional Windows PowerShell console that allows us to run many different commands i.e. Get‑Service. Then he dives into PowerShell Core which pretty much lets you do much of the same things but with the flexibility to run across multiple platforms. Then he discusses the tools🔧 we can use to developed PowerShell⚡️🐚 scripts with the legacy Windows PowerShell⚡️🐚 ISE (Integrated Scripting Engine) and the newer Visual Studio Code. Next, he discusses Windows Admin Center which uses PowerShell⚡️🐚 on the back-end to do all of the administration on Windows Server and also integrates with Microsoft Azure☁️.

Now after a great introduction, we were ready to dive right into the PowerShell⚡️🐚 Basics. At a very basic level with PowerShell⚡️🐚 follows a Verb‑Noun syntax. In other words, you do something to something. For example, if you wanted to get information about the verbs that are available for use in PowerShell⚡️🐚. You would use a cmdlet called “Get‑Verb”. Another important piece to PowerShell⚡️🐚 commands is parameter(s). Parameters are used to pass information into PowerShell⚡️🐚 commands so that we have information that we can use for the command to use to do its work.

Ex: Get‑Service ComputerName

Next, Michael introduces us to three of the most important commands that everyone should know which of course is Get‑Command, Get‑Help, and Get‑Member. These commands are so significant because they allow you to find the answers you need to work within PowerShell⚡️🐚. So, there is no need to Google or Bing to find the right syntax. Everything is right there in the console. Also, it’s really the best way to learn how to do something within PowerShell⚡️🐚.

· Get‑Command is used to search for the installed commands within PowerShell⚡️🐚.

· Get‑Help allows us to see how we use a specific command that we found so it displays the help information.

· Get‑Member allows us to get the properties and methods of objects that are the output of a specific command

PowerShell⚡️🐚 is an object‑oriented language. Unlike other scripting languages that rely on syntax to get things done, PowerShell⚡️🐚 uses objects as its output and objects have properties that make them up and they have methods that you can perform actions against them. The best way to visualize objects in PowerShell⚡️🐚 is to view the data in a table format. PowerShell⚡️🐚 places all of the data from commands into a collection or a table to store that data.

Next, we took a deep dive with pipeline in PowerShell⚡️🐚 which is really where the real power in PowerShell⚡️🐚 comes in. PowerShell⚡️🐚 treats all data as objects that can be used to their full potential in PowerShell⚡️🐚. So pipe-lining PowerShell⚡️🐚 is a way to send the output of one command into a second command, and this allows you to do more complex work like sort or filter something, and then from that output, we can do something.

Ex: Get-Service | where {$_.Status -eq “Stopped”}

Then the course further discusses one of the most common use cases for PowerShell⚡️🐚 which is gathering system information. In PowerShell⚡️🐚 there are several options. The first option is Windows Management Instrumentation (WMI). WMI is based on the Common Information Model, an open standard that defines how managed elements in an IT environment are represented as a common set of objects, as well as the relationship between them. The second option is CIM. CIM was originally introduced in PowerShell⚡️🐚 v3 to work with WMI. CIM cmdlets are now the de facto standard and the WMI cmdlets are now considered legacy as there has been no recent development or enhancements to WMI. When WMI Cmdlets are being called information is accessed through namespaces in the WMI repository. CIM is namespace that for the specifics classes that we’re looking for. An example of a classes is Win32_Processor which contains information like device ID and name for our processors. This information is stored as properties that are accessible from the objects output by the command.

Now that we are comfortable using PowerShell⚡️🐚 on local systems, Michael now discusses how we can use PowerShell⚡️🐚 to connect remote systems which is generally how PowerShell⚡️🐚 is used as much of our troubleshooting and problem resolution happens remotely. PowerShell⚡️🐚 has a few options for remoting, WMI and Windows Remote management (WinRM). WinRM is a Web Service for Management Protocol (WS Man) which allows users to run PowerShell⚡️🐚 commands on remote computers 🖥️. Both options are available on Windows PowerShell and PowerShell Core. PowerShell⚡️🐚 remoting allows you to send commands to a remote machine on your network. WinRM is responsible for maintaining the connections between these two systems. The computers 🖥️ you want to connect to need must have listener set up so that WinRM knows to listen for the power shell connections. By default, Windows clients don’t have PowerShell⚡️🐚 remoting turned on. So, it must be enabled if you plan to use PowerShell⚡️🐚 for remote Administration.

Enable-PSRemoting needs to be enabled on the target machine

Next, you need to give the user access to PowerShell⚡️🐚 remoting Enter-PSSession

This allows you to modify the session permissions and this will allow the remote connection to happen for PowerShell core. You need to set up the remote system to be an endpoint for power shell connections. This is done by installing a script located in the PS Home Directory

To put the finishing touches on this excellent introductory course Michael walks us through on how to Build a User Inventory Script with PowerShell⚡️🐚

By default, PowerShell⚡️🐚 ‘s execution policy is set to Restricted; which means that scripts will not run.

The Set-ExecutionPolicy cmdlet enables you to determine which Windows PowerShell⚡️🐚 scripts will be allowed to run on your computer.

Windows PowerShell⚡️🐚 has four different execution policies:

Restricted – No scripts can be run. Windows PowerShell⚡️🐚 can be used only in interactive mode.
AllSigned – Only scripts signed by a trusted publisher can be run.
RemoteSigned – Downloaded scripts must be signed by a trusted publisher before they can be run.
Unrestricted – No restrictions; all scripts can be run.

After completing this amazing course, we are now armed with foundational knowledge on how to use PowerShell⚡️🐚. So,next we decided to continue our review of PowerShell⚡️🐚 and how we can utilize it with SQL Server to perform common DBA and Developer tasks. Once again, we turned to Pluralsight through SQL Server MVP Robert C. Cain fantastic course on PowerShell and SQL Server

Robert’s course provides the fundamentals of using PowerShell⚡️🐚 to manage SQL Servers. The course is designed as a six-part series covering basic DBA Tasks using just PowerShell⚡️🐚, an introduction to SQL Management Objects (SMO) and the SQL Provider, Basic DBA tasks using both SMO and SQL Provider, Development using SQL Provider, Development using SMO, and Real-World🌎 Examples, SQL PS, PowerShell⚡️🐚 Jobs

After providing us with a simple PowerShell⚡️🐚 email function “Send Easy Mail” that can be used with SQL Server notifications or alerts we take a look how to manage the SQL Server Services which we obviously find status of their services through “Get-Service” cmdlet. When using the Get-Service cmdlet you need to pass in several parameters.

Get-service -ComputerName $server | where {($_.name -like “MSSQL$*” -or $_.name -like “MSSQLSERVER” -or $_.name -like “SQL Server (*”)}

Next, we took a look at counters. As Database professionals many of us keep a customized list of counters to monitor our database environment. PowerShell⚡️🐚 makes it extremely easy to get counter information using the Get-Counter cmdlet.

A great example on we can use counters with PowerShell⚡️🐚 can be found on MSSQL Tips. WMI which we learned earlier stands for Windows Management Instrumentation. WMI and PowerShell⚡️🐚 can be used for a common task like monitoring how much disk space you have available on the system.

Another great example can be found on MSSQL Tips

PowerShell⚡️🐚 also makes it very easy to get information out of the event log using Get-EventLog.

See MSSQL Tips

Next, Robert introduces us to SQL Provider and the SQL Management Object (SMO). In order to use SMO effectively you will need to have a basic understanding of objects most of which received during Michael’s course

Just to briefly review, objects are based on classes, classes are generally referred to as the blueprint for creating an object. In PowerShell⚡️🐚, you would have instantiated an instance of that class. Instantiation refers to the process of creating a single object from the class. An instance in object terms refers to that particular object which you just instantiated. As we know, objects have properties. These properties hold information about the current state of an object. Objects can also have methods and method is similar to how stored procedure works. In other words, it will only work inside of its current database. Whenever you organize a group of objects together it’s known as a collection. For example, a database object would have a collection of table objects. Another important concept to grasp is objects can contain other objects and collections.

SMO is a set of. NET libraries specifically designed for working with SQL Server. These libraries stored in. NET DLLs are loaded with classes. From these classes you can create objects that parallel things in SQL Server.

The SQL Provider is a PSSnapin which is a compiled module written PowerShell⚡️🐚. The SQL Provider PSSnapin is collection of cmdlets specific to SQL Server.

By default, neither the SQL Provider or the SMO libraries are loaded into PowerShell⚡️🐚. So, they need to be added manually by executing the following syntax:

Install-Module -Name SqlServer

Install-Module -Name SqlServer -AllowClobber

As a basic example, we can use SQL Provider to connect to a SQL Server and find what instances are on that server. We can use Get-ChildItem cmdlet to return the instance information and then put that into an array. Then we can utilize foreach and for each object in these instances that’s returned so we can do our work against each child name contained in our instances array. SMO is a. NET library that Microsoft designed for working with SQL Server. At the very core of SMO, everything is a server object and an object is something as it exists. For example, a server object has a corresponding server associated with it. Each server has things on it such as backup devices, credentials and databases, etc. In addition to objects, SMO has collections which are basically an array of objects. The SMO model considers each instance to be an individual server. Because SMO starts at the instance level, there’s no really good way using SMO to go out and query all of your machines and find out what instances you’re running🏃‍♂️. However, we can use ADO. NET to get all our servers we want to manage.

After covering SQL Provider and SMO, Robert then show us through his custom script how we can use SQL Provider and SMO to perform tasks a DBA might do. Next, we looked at the relationship between SMO and the SQL Provider.

After performing DBA tasks with SQL Provider and SMO, we learned how to develop against SQL Server using just the SQL Provider. See Scripts

Next, we were provided the challenge of doing the same tasks in the previous module like creating databases but this time🕰️ using SMO. See Scripts

Finally, as we wrapped up Robert’s excellent series, we were provided real world🌎 example of using PowerShell⚡️🐚. Our mission was to find tables that contain Text, end text and image data types. See Scripts

As an encore, the course covers SQLPS. SQLPS is a special customized shell just for working with SQL Server. It was initially written between PowerShell⚡️🐚 v1 and PowerShell⚡️🐚 v2, so it’s bit of amalgamation. It most cases SQLPS should be avoided as it not the full PowerShell⚡️🐚 environment and especially should be avoided for SQL Server Agent Scheduled jobs as you should call to the full shell when executing SQL Server Scheduled Jobs. We really enjoyed our time🕰️ re-learning PowerShell⚡️🐚 and we were glad to be back on our learning Journey.

“The Magical Mystery Tour Is coming to take you away… Coming to take you away”

Thanks –

–MCS

Week of May 15th

Posted on May 15, 2020 by Mark Shay

“Slow down, you move too fast…You got to make the morning last.”

Happy International Day of Families and for those celebrating in the US Happy Chocolate Chip Day!

This week’s Journey was a bit of a laggard in comparisons to previous week’s journeys but still productive, nonetheless. This week we took a break from Machine Learning while sticking to our repertoire and with our reptilian programing friend. Our first stop was to head over to installing Python on Windows Server which we haven’t touched on so far.As we tend to make things more challenging than they need to be we targeted an Oldie but a goodie Windows Server 2012 R2 running SQL Server 2016. Our goal to configure a SQL Server Scheduled Job that runs a simple Python Script which seemed liked a pretty simple task. We found an nice example of this exact scenario on MSSQL Tips – Run Python Scripts in SQL Server Agent

First, we installed Python and followed the steps and lo and behold it didn’t work right away. To quote the great Gomer Pyle “Surprise, surprise, surprise”. No worries we had this… After a little bit of troubleshooting and trying to interpret the vague error messages in the SQL Server Agent Error log we got it working… In turns out, we had a multitude of issues ranging from the FID that was running the SQL Agent service not having the proper entitlements to the directory where the py script lived and the more important prerequisite of Python not being in the User Environment Variables for the Service account to know where to launch the executable. Once resolved, we were off to the races or at least we got the job working.

At this point we were feeling pretty ambitious, so we decided rather than using the lame MS Dos style batch file we would use a cool PowerShell Script as a wrapper for our python code for the job… Cool but not so cool on Windows Server 2012 R2. First, we started out with set-executionpolicy remotesigned command which needs to be specified in order to execute PowerShell but because we were using an old jalopy OS we had to upgrade the version of the .Net runtime as well as the version of PowerShell. Once upgraded and we had executed a few additional commands and then we were good to go…

[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12

Install-PackageProvider -Name NuGet -RequiredVersion 2.8.5.201 -Force

Install-Module -Name SqlServer -AllowClobber

After spending a few days here, we decided to loiter a little bit longer and crank out some SQL maintenance tasks in Python like a simple backup Job. This was pretty straight forward once we executed a few prerequisites.

python -m pip install –upgrade pip

Pip install pip

pip install pyodbc

pip install pymssql-2.1.4-cp38-cp38-win_amd64.whl

pip install –upgrade pymssql

Our final destination for the week was to head back over to a previous jaunt and play with streaming market data and Python. This time we decided to stop being cheap and pay for an IEX account

Fortunately, they offer pay by the month option with opt out any time so it shouldn’t get too expensive. To get re-acclimated we leveraged Jupyter notebooks and banged out a nifty python/pandas/matlib script that generates the top 5 US Banks and there 5-year performance. See attachment.

“I have only come here seeking knowledge… Things they would not teach me of in college”

Below are some topics I am considering for my adventures next week:

Vagrant with Docker
Data Pipelines
- Google Cloud Pub/Sub (Streaming Data Pipelines)
- Google Cloud Data Fusion ( ETL/ELT)
Back to Machine Learning
ONTAP Cluster Fundamentals
Google Big Query
Python -> Stream Data from IEX -> Postgres
Data Visualization Tools (i.e. Looker)
ETL Solutions (Stitch, FiveTran)
Process and Transforming data/Explore data through ML (i.e. Databricks) .
Getting Started with Kubernetes with an old buddy (Nigel)

Stay Safe and Be well –

–MCS

Week of April 17th

Posted on April 17, 2020 by Mark Shay

“Seems so good… Hard to believe an end to it.. Warehouse is bare”

Greetings and Salutations..

My odyssey this week didn’t quite have the twists and turns of last week’s pilgrimage. Perhaps, it’s was a bit of hangover from “Holy Week” or just the sheer lassitude from the an over abundance of learning new skills during this time of quarantine?

Last weekend, I finally got a chance to tinker around with Raspberry PI. I managed to get it setup on my Wifi network.. Thanks to finding an HDMI cable to hook up to my 50′ Samsung TV. I also learned how to set VSCode on the Amanda’s Mac and connect via SSH to my PI with out using a password (creating a token instead) which is convenient little trick. In addition, I got to play around with the Sense HAT Python module and made the Raspberry PI display some trippy flashing lights with the on-top LED board add-on.

So after being on Cloud 9 most of the last week, I decided I should stay a little longer in the stratosphere and bang on the Google Cloud Platform or more endearing known as GCP and their Compute Engine. Again, I would rebuild my SQL Server 2016 Always On environment (previously created several weeks back on AWS and last week on Azure). Once again, the goal would be to compare and contrast now all 3 major cloud providers. In effort to avoid redundancy here, I won’t reiterate the same prerequisites taken to build out my SQL farm. However, I will share my observations with GCP

GCP – Compute Engine Issues – Observations:

More options than Azure but less then AWS – Interface was bit confusing to navigate through
1st instance built (Domain Controller) got corrupt whenI upgraded my account from Free to Paid Instance need to be rebuilt
Windows Domain failed to be created as Administrator account was set to blank (disabled on AWS EC2 and Azure VMs)
Disks can only be detached from an instance that is an “Stopped” state

Here are my final rankings based on the the 3 Cloud providers I evaluated

Pros

1. Best UI: AWS2.

2. Most Features: AWS

3. Easiest to Navigate UI: Azure

4. Most suited for Microsoft workloads (i.e. SQL Server): Azure

5. Most enticing to use it: GCP (free offer)

6. Most potential: GCP

Cons

1. Hardest to Navigate UI: GCP

2. Hardest to configure (security): AWS

3. Least amount of features: Azure

So after burning out my fuse up there alone… it was time to touch down and bring me round again.. And that led me back to SQL Server and Columnstore Indexes. Here is what I did:

Completed Joe Sack’s Pluralsight course: SQL Server 2012: Nonclustered Columnstore Indexes

Watched Row-based vs. Column-based Indexes Module in Kimberly L. Tripp awesome course in Pluralsight on SQL Server: Indexing for Performance

Watched Designing Columnstore Indexes for Analytic Queries Module in Gail Shaw’s Designing and Implementing SQL Server Database Indexes

Downloaded/Restored modified AdventureWorksDW2012
- Created 60 million rows for the Internet Sales Fact table
- Created Columnstore Index and benchmarked performance vs old Row based Index
- Created Table partition for Internet Sales Fact table and designed Partition Switching scenario (workaround used with SQL 2012 and non-clustered CIs) Old Lava Archive database design.

Started reading through Niko Neugebauer’s awesome blog posts on Columnstore – http://www.nikoport.com/columnstore/ (1- 11)

Next Steps..
Below are some topics I am considering for my voyage next week:

SQL Server Advanced Features:
- Best practices around SQL Server AlwaysOn (Snapshot Isolation/sizing of Tempdb, etc)
Data Visualization Tools (i.e. Looker)
ETL Solutions (Stitch, FiveTran)
Process and Transforming data/Explore data through ML (i.e. Databricks) .
Getting Started with Kubernetes with an old buddy (Nigel)

Stay safe and Be well

—MCS

Week of April 10th

Posted on April 10, 2020 by Mark Shay

“…When the Promise of a brave new world unfurled beneath a clear blue Sky”

“Forests, lakes, and rivers, clouds and winds, stars and flowers, stupendous glaciers and crystal snowflakes – every form of animate or inanimate existence, leaves its impress upon the soul of man.” — Orison Swett Marden

My journey for this week turned out to be a sort of potpourri of various technologies and solutions thanks to the wonderful folks at MSFT. After some heavy soul searching over the previous weekend, I decided that my time would be best spent this week on recreating the SQL Server 2016 with Always On environment (previously created several weeks back on AWS EC2) but in the MS Azure Cloud platform. The goal would be to better understand Azure and how it works. In addition, I would be able to compare and contrast both AWS EC2 vs. Azure VMs and be able to list both the pros and cons of these cloud providers.

But before I could get my head into the clouds I was still lingering around in the bamboo forests. This past weekend, I was presented with an interesting scenario to stream market data to pandas from the investors exchange (Thanks to my friend) . So after consulting with Mr. Google, I was pleasantly surprised to find that IEX offered an API that allows you to connect to there service and stream messages directly to Python and use Pandas for data visualization and analysis. Of course being the cheapskate that I am I signed up for a free account and off I went.

So I started tickling the keys, I produced a newly minted IEX Py script. After some brief testing, I started receiving an obscure error? Of course there was no documented solution on how to the address such an error..

So after some fruitless nonstop piping of several modules, I was still getting the same error. 🙁 After a moment of clarity of I deduced there was probably limitation on messages you can stream from the free IEX account..

So I took shot in the dark and decided to register for another account (under a different email address) this way I would receive a new token and give that a try

… And Oh là là! My script started working again! 🙂 Of course as I continued to add more functionality and test my script I ran back into the same error but this time I knew exactly how to resolve it.

So I registered for a third account (to yet again generate a new token ). FortunateIy, I completed my weekend project. See attachments Plot2.png and Plot3.png for pretty graphs

Now that I could see the forest through the trees and it was off to the cloud! I anticipated that it would take me a full week to explore Azure VMs but it actually only took a fews to wrap my head around it..

So this left me chance to pivot again and this time to a Data Warehouse/ Data Lake solution built for the Cloud. Turning the forecast for the rest of the week to Snow.

Here is a summary of what I did this week:

Sunday:

Developed Pandas/Python Script in conjunction with iexfinance & matplotlib modules to build graphs to show historical price for MSFT for 2020 and comparison of MSFT vs INTC for Jan 2nd – April 3rd 2020

Monday: (Brief summary)

Followed previous steps to build the plumbing on Azure for my small SQL Server farm (See Previous status on AWS EC2 for more details)

Created Resource Group
Create Application Security Group
Created 6 small Windows VMs in the same Region and an Availability Zone
Joined them to Windows domain

Tuesday: (Brief summary)

Created Windows Failover Cluster
Installed SQL Server 2016
Setup and configured AlwaysOn AGs and Listeners

Observations with Azure VMs:

Cons

Azure VMS are very slow first time brought up after build
Azure VMS has a longer provisioning time than EC2 Instances
No UI option to perform bulk tasks (like AWS UI) . Only option is Templating thru scripting
Can not move Resource Group from one Geographical location to another like VMs and other objects within Azure
When deleting a VM all child dependencies are not dropped ( Security Groups, NICs, Disks) – Perhaps this is by design?

– Objects need to be dissociated with groups and then deleted for clean up of orphan objects

Neutral

Easy to migrate VMs to higher T-Shirt Sizes
Easy to provision Storage Volumes per VM
Application Security Groups can be used to manage TCP/UDP traffic for entire resource group

Pros

You can migrate existing storage volumes to premium or cheaper storage seamlessly
Less network administration
- less TCP/UDP ports need to be opened especially ports native to Windows domains

Very Easy to build Windows Failover clustering services
- Natively works in the same subnet
- Less configuration to get Connectivity to working then AWS EC2

Very Easy to configure SQL Server 2016 Always On
- No need to create 5 Listeners (different per subnet) for a given specific AG
- 1 Listener per AG

Free Cost, Performance, Operation Excellence Recommendations Pop up after Login

Wednesday:

Registered for an Eval account for Snowflake instance
Attended Zero to Snowflake in 90 Minutes virtual Lab
- Created Databases, Data Warehouses, User accounts, and Roles
- Created Stages to be used for Data Import
- Imported Data Sources (Data in S3 Buckets, CSV, JSON formats) via Web UI and SnowSQL cmd line tool
- Ran various ANSI-92 T-SQL Queries to generate reports from SnowFlake

Thursday:

Completed Pluralsight course Snowflake? Why Should I Care?
Watched the following YouTube Videos:
See Snowflake in 8 Minutes
Snowflake Architecture – Learn How Snowflake Stores Table data
Started Series: Getting Started – Introduction to Worksheets & Queries
Imported CitiBike System Data
Configured the SnowFlake Python Module
Developed a Pandas/Python Script using snowflake.connector & matplotlib modules to build a graph to show Citibike total rides over 12 month period (in descending order by rides per month) . See attachment plot.png

Friday:

**Bonus Points **

More Algebra – Regents questions.
More with conjugating verbs in Español (AR Verbs)

Next Steps..
Below are some topics I am considering for my voyage next week:

SQL Server Advanced Features:

– Columnstore Indexes
– Best practices around SQL Server AlwaysOn (Snapshot Isolation/sizing of Tempdb, etc)

Data Visualization Tools (i.e. Looker)
ETL Solutions (Stitch, FiveTran)
Process and Transforming data/Explore data through ML (i.e. Databricks) .
Getting Started with Kubernetes with an old buddy (Nigel)

Stay safe and Be well

—MCS

Week of March 20th

Posted on March 20, 2020 by Mark Shay

Hey, you, get off of my cloud”! (that includes you Corona)

So after taking a brief victory lap for getting my virtual Windows Domain Sandbox up and running last week in AWS EC2. This week I attempted to get SQL Server AlwaysOn running with multiple Availability Groups and multiple Replicas (Async w/Read, Async w/ No read, Sync with Automatic Failover and Sync with Read Automatic Failover options)

After spending multiple days and countless attempts, I was left with numerous disappointments for most of the week as I fruitlessly chased numerous red herrings with non-useful Windows Cluster Log errors.

However, I refused to accept defeat! So I am happy to report that I finally persevered for this week’s mission

Here is what I did:

Removed my flaky implementation of AlwaysOn AGs and Listener with some lame hacks that had my AlwaysOn Implementation sudo running earlier in the week

Destroyed Windows Failover Cluster

Began the recovery mission:

Allocated 5 new class C subsets to existing VPC

Added additional Policies to Security Group in VPC
- Added UDP port for SQL Browser Service
- Added TCP ports used with AlwaysOn

Added a second virtual NIC to designated 5 SQL Severs EC2 instances

Configured 2nd IP for 2nd NIC using unique IP from designated Class C subnet

Designated DC/DNS Server as my Jump Server as all SQL EC2 instances are no longer routable from internet thru RDP

Recreated Windows Failover Cluster with 5 SQL Servers

Added 5 virtual Network IPs (1 for each Class C subnet) to be used with Windows Cluster.

Recreated Always On with 2 AGs with ASync w/Read, ASync w/ No read, Sync with AF and Sync with Read AF

Configured SQL Listener to use 5 IPs in each subnet that hosts SQL Servers.

Added 5 Network IP under AG resources in Windows Cluster to be used with SQL Listener

Enabled SQL Browsing Service on all 5 SQL EC2 instances (As I tried to be cute and make all SQL instances a Named Instances instead using default)

Removed all SQL Aliases previously created as they are no longer needed

Resolved “nslookup” issues that were plaguing name resolution on my Network

Added 5 new Class C subnets to Reverse Lookup Zones in DNS

Added Windows Cluster and SQL Listener Names for each subnet to DNS Forward Lookup Zones

Successfully Tested failover scenarios w/ SQL AlwaysOn

Installed and configured Redgate SQL Data Generator software on Jump Server

Successfully Tested various scenarios with pumping millions of records to the SQL Listener while initiating AlwaysOn failover

Next Steps..

Play around a little bit more with SQL AlwaysOn
Maybe take on Snakes and Bears (Start by watching some videos)…
Stayed Tuned.

Peace Out –
–MCS

Week of March 13

Posted on March 13, 2020 by Mark Shay

Forget Corona! Take me to the cloud

Creating and Configuring a SQL Server AlwaysOn Configuration in AWS EC2

Created VPC in AWS
- Designated Private Subnet address
- Specified proper routes

Created Security Groups
- Specified specific Protocols, Ports, Source/destination

Created 6 micro AWS Instances of Windows 2016 DC Edition
- 1 Domain Controller
- 1 Primary Database Server
- 4 Replica Database Servers

Configured Windows Active Directory, DNS
- Joined Windows Domain
- Created FID to be used with SQL Server Services

Established connectivity between “MCSSYSTEMS” AWS Instances
- Configured virtual NICs to point internal DNS and use static IP

Configured additional storage volumes
- 15 (300 GB) Volumes – 3 volumes (Data, Log, TempDB ) to be assigned to 5 designated SQL Servers)
- Attached storage to AWS Instances to be used with SQL Server
- Brought Disks online as Dynamic disks
- Formatted Disks (64K)

Migrated 5 AWS micros instances to medium t-shirt size Instance so to be used with SQL Server
- re-configured virtual NICs to point internal DNS and use static IP (as all settings were lost during migration)

Installed SQL Server 2016 w/ SP2 and SSMS 2018 on 5 Instances
- Disabled Windows FW
- enforced best practices (separation of Data, Log, Temp)
- Created multiple Temp DB files to match v CPUs
- Put SQL FID in Lock Pages in Memory Policy, instant file initialization
- Configured static port # to be used with SQL Server instances
- Configured SQL Server alias on instances can connect to friendly name (DNS is flaky)

Began SQL Server AlwaysOn Prep work
- Install/Configured Windows Fail Over Cluster Services on 5 SQL Server AWS Instances
- Enable each SQL Server for Always on availability groups
- verified all SQL Servers are running properly and prepped for Always

Next Steps..

Download and attach AdventureWorks database (Sample)
Configure AlwaysOn, Availability Group(s), Listener
Create Primary, Secondary, and Read Only Replicas, etc
Consult with the Big Guns