python-programing | SQL Squirrels

Week of May 22nd

Posted on May 22, 2020 by Mark Shay

“And you know that notion just cross my mind…“

Happy Bitcoin Pizza Day!

All aboard! This week our travels would take us on the railways far and high but before, we can hop on the knowledge express we had some unfinished business to attended too.

“Oh, I get by with a little help from my friends”

If you have been following my weekly submissions for the last few weeks I listed as future action item “create/configure a solution that leverages Python to stream market data and insert it into a relational database.“

Well last week, I found just the perfect solution. A true master piece by Data Scientist/Physicist extraordinaire AJ Pryor, Ph.D. AJ had created a brilliant multithreaded work of art that continuously queries market data from IEX and then writes it to a PostgreSQL database. In addition, he built a data visualization front-end that leverages Pandas and Bokeh so the application can run interactively through a standard web browser. It was like a dream come true! Except that the code was written like 3 years ago and referenced a deprecated API from IEX.

Ok, no problem. We will just simply modify AJ’s “Mona Lisa” to reference the new IEX API and off we will go. Well, what seemed like was a dream turned into a virtual nightmare. I spent most of last week spinning my wheels trying to get the code to work but to no avail. I even reached out to the community on Stack overflow but all I received was crickets..

As I was ready to cut my loses, but I reached out to a longtime good friend who happens to be all-star programmer and a fellow NY Yankees baseball enthusiast. Python wasn’t his specialty (he is really an amazing Java programmer) but he offered to take a look at the code when he had some time… So we set up a zoom call this past Sunday and I let his wizardry take over… After about hour or so he was in a state of flow and had a good pulse of what our maestro AJ’s work was all about. After a few modifications my good chum had the code working and humming along. I ran into a few hiccups along the way with the brokeh code, but my confidant just referred me to run some simpler syntax and then abracadabra… this masterpiece was now working on the Mac! As the new week started, I was still basking in the radiance of this great coding victory. So, I decided to be a bit ambitious and move this gem to the cloud which would be like the crème de la crème of our learnings thus far. Cloud, Python/Pandas, Streaming market data, and Postgres all wrapped up in one! Complete and utter awesomeness!

Now the question was for which cloud platform to go with? We were well versed in the compute area in all 3 of the major providers as a result of our learnings.

So with a flip of the coin ,we decided to go with Microsoft Azure. That and we had some free credits still available.

With sugar plum fairies dancing in our head, we spun up our Ubuntu Image and we followed along the well documented steps on AJ’s Github project

Now, we were now cooking with gasoline ! We cloned AJ’s Github repo, modified the code with our new changes, and executed the syntax and just as we were ready to declare victory… Stack overflow Error! Oh, the pain.

Fortunately I didn’t waste any time, I went right back to my ace in the hole but with some trepidation that I wasn’t being too much of irritant.

I explained my perplexing predicament and without hesitation my Fidus Achates offered some great trouble shooting tips and quite expeditiously we had the root cause pinpointed. For some peculiar reason, the formatting of URL that worked like a charm on the Mac was a dyspepsia on Ubuntu on Azure. It was certainly a mystery but one that can only be solved by simply rewriting the code.

So once again, my comrade in arms helped me through another quagmire. So, without further ado, may I introduce to you the one and only…

http://stockstreamer.eastus.cloudapp.azure.com:5006/stockstreamer

We’ll hit the stops along the way We only stop for the best

After feeling victorious after my own personal Battle of Carthage and with our little streaming market data saga out of our periphery it was to time to hit the rails…

Our first stop was messaging services which is all the rage now a days. There are so many choices with data messaging services out there.. So where to start with? We went with Google’s Pub/Sub which turned out to be a marvelous choice! To get enlightened with this solution, we went to Pluralsight where we found excellent course on Architecting Stream Processing Solutions Using Google Cloud Pub/Sub by Vitthal Srinivasan

Vitthal was a great conductor who navigated us through an excellent overview of Google’s impressive solution, uses cases, and even touched on a rather complex pricing structure in our first lesson. He then takes us deep into the weeds showing us how to create Topics, Publishers, and Subscribers. He goes on further by showing us how to leverage some other tremendous offerings in GCP like Cloud Functions, API & Services, and Storage.

Before this amazing course my only exposure was just limited to GCP’s Compute Engine so this was eye opening experience to see the great power that GCP had to offer! To round out the course, he showed us how to use GCP Pub/Sub with some client Libraries which was excellent tutorial on how to use Python with this awesome product. There was even two modules on how to integrate Google Hangout Chatbot with Pub/Sub but that required you to be a G Suite User. (There was free trial but skipped the set up and just watched the videos) Details on the work I did on Pub/Sub can be found at

May_22_logDownload

“I think of all the education that I missed… But then my homework was never quite like this”

For Bonus this week, I spent enormous amount of time brushing up my 8th grade Math and Science Curriculum

Liner Regression
Epigenetics
Protein Synthesis

Below are some topics I am considering for my Journey next week:

Vagrant with Docker
Continuing with Data Pipelines
Google Cloud Data Fusion (ETL/ELT)
More on Machine Learning
ONTAP Cluster Fundamentals
Google Big Query
Data Visualization Tools (i.e. Looker)
ETL Solutions (Stitch, FiveTran)
Process and Transforming data/Explore data through ML (i.e. Databricks) .
Getting Started with Kubernetes with an old buddy (Nigel)

Stay safe and Be well –

–MCS

Week of May 8th

Posted on May 8, 2020 by Mark Shay

“Now it’s time to leave the capsule if you dare..”

Happy Friday!

Before we could end our voyage and return our first mate Slonik to the zookeeper, we would first need to put a bow on our Postgres journey (for now) by covering a few loose ends on advanced features. Saturday, we kicked it off with a little review on Isolation levels in Postgres (including a deep dive on Serializable Snapshot Isolation (SSI)) Then on to Third-parting monitoring for database health and streaming replication, and for the la cerise sur le gâteau… Declarative Partitioning and Sharding!

Third-Party Monitoring

We evaluated 2 solutions OpsDash and PgDash. Both were easy to set up and both gave valuable information in regards to Postgres. OpsDash provided more counters and is it can monitor system information as well as other services running on Linux where as PgDash is Postgres specific and will give you a deeper look into Postgres and Streaming Replications than just querying the native system views

Declarative Partitioning

It was fairly straight forward to implement Declarative partitioning. We reinforcement such concepts by turning to Creston’s plethora of videos on the topics as well as turning to several blog posts. See below for detailed log.

Sharding Your Data with PostgreSQL

There are third party solutions like Citus Data that seem to offer a more scalable solution but out of the box you can implement Sharding with using Declarative Partitioning set up on a Primary Server and using a Foreign Data Wrapper configured on a remote Server. Then you combine Partitioning and FDW to create Sharding. This was quite an interesting solution although I have strong doubts about how scalable this would be in production.

On Sunday, we took a much-needed respite as the weather was very agreeable in NYC to escape the quarantine…

On Monday, with our rig now dry docked. We would travel through different means to another dimension, a dimension not only of sight and sound but of mind. A journey into a wondrous land of imagination. Next stop, the DevOps Zone!

To begin our initiation into this realm we would start off with HashiCorp’s Vagrant.

For those who not familiar with Vagrant it is not a transient mendicant that the name would otherwise imply but a nifty open-source solution for building and maintaining lightweight and portable DEV environments.

It’s kind of similar to docker for those more familiar but it generally works with virtual machines (although can be used with containers).

At the most basic level, Vagrant uses a smaller version of VMs whereas Docker is kind of the “most minimalistic version for process and OS bifurcation by leveraging containers”.

The reason to go this route opposed to the more popular Docker was that it is generally easier to standup a DEV environment.

With that being said we wound up spending a considerable amount of time on Monday and Tuesday this week Working on this. As I ran into some issues with SSH and “Vagrant UP” process. The crux of issue was related using Vagrant/VirtualBox under an Ubuntu VM that was already running VirtualBox on a Mac. This convoluted solution didn’t seem to play nice. Go figure?

Once we decided to install Vagrant with VirtualBox natively on the Mac we were up and running were easily able to spin up and deploy VMs seamlessly.

Next, we played a little bit with Git. Getting some practicing with the work flow of editing configuration files and pushing the changes straight to the repo.

On Wednesday, we decided to begin our expropriation of a strange new worlds, to seek out new life and new civilizations and of course boldly go where maybe some have dared to go before? That would be of course Machine Learning where the data is the oil and the algorithm is the engine. We would start off slow by just trying to grasp the jargon like training data, training a model, testing a model, Supervised learning, and Unsupervised Learning.

The best way for us to absorb this unfamiliar lingo would be to head over to Pluralsight where David Chappell offered a great introductory course on Understanding Machine Learning

“Now that she’s back in the atmosphere… With drops of Jupiter in her hair, hey, hey”

On Thursday we would go further down the rabbit hole of Machine Learning with Jerry Kurata’s Understanding Machine Learning with Python

There we would be indoctrinated by the powerful tool of Jupyter Notebook. Now armed with this great “Bat gadet” we would reunite with some of our old heroes from the “Guardians of the Python” like “Projectile” Pandas, matplotlib “the masher” and of course numpy “ the redhead step child of Thanos”. In addition, we would also be introduced to a new and special super hero scikit-learn.

For those not familiar with this powerful library “scikit-lean” has unique and empathic powers to our friends Numpy, Pandas and SciPy. This handy py lib ultimately unlocks the key to the Machine Learning Universe through Python.

Despite all this roistering with our exemplars of Python, our voyage wasn’t all rainbows and Unicorns.

We got introduced to all sorts of new space creatures like Bayesian and Gaussian Algos each conjuring up bêtes noires. The mere thought of Bayes theorem drudged up old memories buried deep in the bowls back in college when I was studying probability and just the mere mention of Gaussian functions jarred memories from the Black Swan (and not the ballet movie with fine actresses Natalie Portman and Mila Kunis) but the well-written and often irritating NYT Best seller by Nassim Nicholas Taleb.

Unfortunately, It didn’t get any cozier when we started our course for powerful and complex ensemble of the Random Forrest Algo. There we got bombarded by meteorites such as “Over Fitting”, “Regularization Hyper-parameters” , and “Cross Validation”, and not to mention the dreaded “Bias – variance tradeoff”. Ouch! My head hurts…

Here is the detailed log of my week’s journey

“With so many light years to go…. And things to be found (to be found)”

Below are some topics I am considering for next week’s odyssey :

Run Python Scripts in SQL Server Agent
More with Machine Learning
ONTAP Cluster Fundamentals
Google Big Query
Python -> Stream Data from IEX ->
MSSQLData Visualization Tools (i.e. Looker)
ETL Solutions (Stitch, FiveTran)
Process and Transforming data/Explore data through ML (i.e. Databricks)
Getting Started with Kubernetes with an old buddy (Nigel)

Stay safe and Be well

—MCS

Week of April 10th

Posted on April 10, 2020 by Mark Shay

“…When the Promise of a brave new world unfurled beneath a clear blue Sky”

“Forests, lakes, and rivers, clouds and winds, stars and flowers, stupendous glaciers and crystal snowflakes – every form of animate or inanimate existence, leaves its impress upon the soul of man.” — Orison Swett Marden

My journey for this week turned out to be a sort of potpourri of various technologies and solutions thanks to the wonderful folks at MSFT. After some heavy soul searching over the previous weekend, I decided that my time would be best spent this week on recreating the SQL Server 2016 with Always On environment (previously created several weeks back on AWS EC2) but in the MS Azure Cloud platform. The goal would be to better understand Azure and how it works. In addition, I would be able to compare and contrast both AWS EC2 vs. Azure VMs and be able to list both the pros and cons of these cloud providers.

But before I could get my head into the clouds I was still lingering around in the bamboo forests. This past weekend, I was presented with an interesting scenario to stream market data to pandas from the investors exchange (Thanks to my friend) . So after consulting with Mr. Google, I was pleasantly surprised to find that IEX offered an API that allows you to connect to there service and stream messages directly to Python and use Pandas for data visualization and analysis. Of course being the cheapskate that I am I signed up for a free account and off I went.

So I started tickling the keys, I produced a newly minted IEX Py script. After some brief testing, I started receiving an obscure error? Of course there was no documented solution on how to the address such an error..

So after some fruitless nonstop piping of several modules, I was still getting the same error. 🙁 After a moment of clarity of I deduced there was probably limitation on messages you can stream from the free IEX account..

So I took shot in the dark and decided to register for another account (under a different email address) this way I would receive a new token and give that a try

… And Oh là là! My script started working again! 🙂 Of course as I continued to add more functionality and test my script I ran back into the same error but this time I knew exactly how to resolve it.

So I registered for a third account (to yet again generate a new token ). FortunateIy, I completed my weekend project. See attachments Plot2.png and Plot3.png for pretty graphs

Now that I could see the forest through the trees and it was off to the cloud! I anticipated that it would take me a full week to explore Azure VMs but it actually only took a fews to wrap my head around it..

So this left me chance to pivot again and this time to a Data Warehouse/ Data Lake solution built for the Cloud. Turning the forecast for the rest of the week to Snow.

Here is a summary of what I did this week:

Sunday:

Developed Pandas/Python Script in conjunction with iexfinance & matplotlib modules to build graphs to show historical price for MSFT for 2020 and comparison of MSFT vs INTC for Jan 2nd – April 3rd 2020

Monday: (Brief summary)

Followed previous steps to build the plumbing on Azure for my small SQL Server farm (See Previous status on AWS EC2 for more details)

Created Resource Group
Create Application Security Group
Created 6 small Windows VMs in the same Region and an Availability Zone
Joined them to Windows domain

Tuesday: (Brief summary)

Created Windows Failover Cluster
Installed SQL Server 2016
Setup and configured AlwaysOn AGs and Listeners

Observations with Azure VMs:

Cons

Azure VMS are very slow first time brought up after build
Azure VMS has a longer provisioning time than EC2 Instances
No UI option to perform bulk tasks (like AWS UI) . Only option is Templating thru scripting
Can not move Resource Group from one Geographical location to another like VMs and other objects within Azure
When deleting a VM all child dependencies are not dropped ( Security Groups, NICs, Disks) – Perhaps this is by design?

– Objects need to be dissociated with groups and then deleted for clean up of orphan objects

Neutral

Easy to migrate VMs to higher T-Shirt Sizes
Easy to provision Storage Volumes per VM
Application Security Groups can be used to manage TCP/UDP traffic for entire resource group

Pros

You can migrate existing storage volumes to premium or cheaper storage seamlessly
Less network administration
- less TCP/UDP ports need to be opened especially ports native to Windows domains

Very Easy to build Windows Failover clustering services
- Natively works in the same subnet
- Less configuration to get Connectivity to working then AWS EC2

Very Easy to configure SQL Server 2016 Always On
- No need to create 5 Listeners (different per subnet) for a given specific AG
- 1 Listener per AG

Free Cost, Performance, Operation Excellence Recommendations Pop up after Login

Wednesday:

Registered for an Eval account for Snowflake instance
Attended Zero to Snowflake in 90 Minutes virtual Lab
- Created Databases, Data Warehouses, User accounts, and Roles
- Created Stages to be used for Data Import
- Imported Data Sources (Data in S3 Buckets, CSV, JSON formats) via Web UI and SnowSQL cmd line tool
- Ran various ANSI-92 T-SQL Queries to generate reports from SnowFlake

Thursday:

Completed Pluralsight course Snowflake? Why Should I Care?
Watched the following YouTube Videos:
See Snowflake in 8 Minutes
Snowflake Architecture – Learn How Snowflake Stores Table data
Started Series: Getting Started – Introduction to Worksheets & Queries
Imported CitiBike System Data
Configured the SnowFlake Python Module
Developed a Pandas/Python Script using snowflake.connector & matplotlib modules to build a graph to show Citibike total rides over 12 month period (in descending order by rides per month) . See attachment plot.png

Friday:

**Bonus Points **

More Algebra – Regents questions.
More with conjugating verbs in Español (AR Verbs)

Next Steps..
Below are some topics I am considering for my voyage next week:

SQL Server Advanced Features:

– Columnstore Indexes
– Best practices around SQL Server AlwaysOn (Snapshot Isolation/sizing of Tempdb, etc)

Data Visualization Tools (i.e. Looker)
ETL Solutions (Stitch, FiveTran)
Process and Transforming data/Explore data through ML (i.e. Databricks) .
Getting Started with Kubernetes with an old buddy (Nigel)

Stay safe and Be well

—MCS

Week of April 3rd

Posted on April 3, 2020 by Mark Shay

“The other day, I met a bear. A great big bear, a-way out there.”

As reported last week, I began to dip my toe into the wonderful world of Python.. Last week, I wasn’t able to complete the Core Python: Getting Started by Robert Smallshire and Austin Bingham Pluralsight course . So I had to do some extended learning over last weekend. So last weekend, I was able to finish the “Iteration and Iterables” module which I started last Friday and then spent the rest of the weekend with the module on “Classes” which was nothing short of a nightmare. I spent numerous hours on this module trying to debug my horrific code and rewatching this lessons in the module over and over again. This left me with the conclusion that I just simply don’t get object oriented programming and probably never will..

View Post

Ironically, a conclusion, I derived almost 25 years ago when I attended my last class at University at Albany which was in C++ Object Oriented programming. Fortunately, I escaped that one with a solid D- and was able to pass go and collect $200 and move on to the working world. So after languishing with Classes in Python, I was able to proceed with the final module on File IO and Resource Managements which seemed more straight forward and practical on Monday.

On Tuesday, life got a whole lot easier when I Installed Anaconda – Navigator. Up until this point I was writing my python scripts in TextWrangler Editor on the Mac which was not ideal.

Through Anaconda, I discovered Spider IDE which was like a breath of fresh air. No longer did I have to worry about aligned spaces, open and closed parenthesis, curly and square brackets. Now with the proper IDE environment I was able to begin my journey down the Pandas Jungle…

Here is what I did:

Completed the course of Pandas Fundamentals
Installed Anaconda Panda Python Module, SQL Lite
Created Pandas/Python Scripts:

Read in CSV file (Tate Museum Collection) and output to pickle file
Read in JSON file write output to screen
Traverse directories with multiple JSON files and write output to a file
Perform iteration, aggregation, and filtering (transformation)
Created indexes on data from CSV file for faster retrieval or data
Read data source (Tate Museum Collection) and output data to Excel Spreadsheets, with multiple columns, multiple sheets, and with colored columns options
Connects to RDBMS using SQLAlchemy module (Used SQL Lite Database as POC) which creates a table and writes data to the table from a data source (pickle file)
Create JSON file output from a data source (pickle file)
Create graph using matplotlib.pyplot and matplotlib modules. See attachment.

**Bonus Points ** Continued to drudge old nightmares from freshman year of Highs school as I took a stroll down memory lane with distribute binomials, perfect square binomials, difference of square binomials, factor perfect square trinomials and factor difference of squares, F.O.I.L. and other Algebraic muses.

In addition, revisited conjugating verbs in Español and writing descriptions (en Español) for 9 family members Next Steps..
There are many places I still need to explore..

Below are some topics I am considering:

A Return to SQL Server Advanced Features:

– Columnstore Indexes
– Best practices around SQL Server AlwaysOn (Snapshot Isolation/sizing of Tempdb, etc)

Getting Started with Kubernetes with an old buddy (Nigel)
Getting Started with Apache Kafka
Understanding Apache ZooKeeper and its use cases

I will give it some thought over the weekend and start fresh on Monday.
Stay safe and Be well

—MCS

Week of March 27th

Posted on March 27, 2020 by Mark Shay

“I am an enchanter. There are some who call me…Tim.“

This week I decided to take a break from SQL Server with AlwaysOn on AWS EC2 and focus my attention to the Python programming language. Despite gracefully shutting down all my Instances, I am still racking up over ~$120 of charges and growing.. This is stuff they don’t tell you about in the brochure.. Either case I will hold on to my environment for now and carry on with my re-tooling of my skills….

So far, I have learned quite a lot about Python. Firstly, Its quite an extensive programming language with many use cases and of course no matter how hard I try I still suck at programming.. It’s just not in my DNA. However, the goal was not to reinvent myself with skill that’s clearly one of my weaknesses but to have a good general understanding of the syntax and understand of the value preposition. And that I am happy to report is going that is going quite well so far

Here is what I did with Python:

1. Completed the course of Pluralsight The Big Picture by Jason Olson

2. Installed Home Brew on Amanda’s Mac with the latest version of Python 3.8.x with some nice extras

3. Created several python scripts to be used with the training modules and took notes

4. Completed the following modules Core Python: Getting Started by Robert Smallshire and Austin Bingham

5. To reenforce my leanings I recite ” The Zen of Python” each and every night before I go to bed

** Bonus Points ** I spent over 4 hours this week working on Parabolas and other fun Algebraic equations.

Not to mention a little español and some environmental sciences..
Since I didn’t complete the full course which I hoped too this week. 🙁 I plan to finish the remaining sections of Iteration and Iterables and the two remaining modules on Classes and File IO and Resource Managements over the weekend.. Doing work on the weekend is not a habit I am looking to make a regular practice.

Next Steps..

* Pandas Fundamentals

Have a nice weekend!

–MCS