Azure DocumentDB – Preparing, Loading and Querying Data

[read this post on Mr. Fox SQL blog]

Recently I have been using Azure DocumentDB (aka CosmosDB) to validate several business use cases for a variety of application purposes.

For those SQL DBA’s and others who are new to Azure CosmosDB, its a recent entrant to the NoSQL document database world, and as its a PaaS document database cloud service it has the agility, scalability and availability of the Azure Cloud.

Being a schema-less Azure PaaS “document database” for my use case I wanted to verify…

  • basic costing and performance levels
  • methods to create valid JSON documents from SQL Server
  • methods to load JSON documents into Azure CosmosDB
  • performing basic like-for-like document query comparisons with SQL Server

Some homework reading for those interested…

 

22 May 2017 [EDIT]

As hinted above, Microsoft have just recently added significant new functionality and also formally renamed Azure DocumentDB as Azure CosmosDBa major evolution of the NoSQL database engine.  The details on the renamed service is here – https://azure.microsoft.com/en-au/blog/dear-documentdb-customers-welcome-to-azure-cosmos-db/

 

And so… let get into the belly of Azure DocumentDB CosmosDB!

Continue reading

Compression Performance with Low Key Selectivity

[read this post on Mr. Fox SQL blog]

When it comes to enabling data compression in SQL Server I suspect most people don’t do a lot of testing as to the impacts of either enabling compression and its ongoing maintenance.  I assume most are interested in its ongoing performance for queries, and you know that’s damn fair enough too!

If you read the Microsoft Data Compression Whitepaper (and I mean whos got time to do that!) then it has some interesting technical tidbits burried deep in there which affect the impacts of applying and managing data compression.

https://technet.microsoft.com/en-us/library/dd894051(v=sql.100).aspx

Here are some of the ones of interest…

SQL Server uses statistics on the leading column to distribute work amongst multiple CPUs, thus multiple CPUs are not beneficial when creating, rebuilding, or compressing an index where the leading column of the index has relatively few unique values or when the data is heavily skewed to just a small number of leading key values – only limited effective parallelism will be achieved in this case.


And this…

Compressing or rebuilding a heap with ONLINE set to ON uses a single CPU for compression or rebuild. However, SQL Server first needs to scan the table—the scan is parallelized, and after the table scan is complete, the rest of the compression processing of the heap is single-threaded.


And this…

When a heap is compressed, if there are any nonclustered indexes on the heap, they are rebuilt as follows:
(a) With ONLINE set to OFF, the nonclustered indexes are rebuilt one by one.
(b) With ONLINE set to ON, all the nonclustered indexes are rebuilt simultaneously.
You must account for the workspace required to rebuild the nonclustered indexes, because the space for the uncompressed heap is not released until the rebuild of the nonclustered indexes is complete.

 

That first one is a cracker – it hit me once when compressing a SQL Server table (600M+ rows) on a 64 core Enterprise SQL Server.  After benchmarking several other data compression activities I thought I had a basic “rule of thumb” (based on GB data size and number of rows)… of which just happened to be coincidence!

This also begs the question of why would you use low selectivity indexes?  Well I can think of a few cases – but the one which stands out the most is the identification of a small number of rows within a greater collection – such as an Index on TYPE columns (ie; [ProcessingStatusFlag] CHAR(1) = [P]rocessed, [U]nprocessed, [W]orking, [F]ailed, etc)

… AND SO – lets do some testing to validate this puppy!

Continue reading

A “Business Intelligence” (BI) Definition

[read this post on Mr. Fox SQL blog]

Over the years I have presented many times to various clients describing Business Intelligence (BI) solutions using the Microsoft BI solutions stack.

In all of my sessions, regardless of the specific focus, each time I always start with the same 2 content slides.

  • The first is always the positioning graphic from the Gartner Magic Quadrant for Business Intelligence & Analytics.  This shows all top BI vendors and how they are positioned against each other, and is republished yearly (typically in Feb).  This is often a great introduction as to why Microsoft BI is such a great play. If interested an introduction to the paper is here, but you can also find the full report on various web sites (https://www.gartner.com/doc/2989518/magic-quadrant-business-intelligence-analytics)
  • The second is always a definition of exactly what Business Intelligence (BI) actually is, and this is the purpose of this short blog.

And so, lets dig down into a business definition for Business Intelligence!

Continue reading

Calculating Table Partition Sizes in Advance

[read this post on Mr. Fox SQL blog]

Continuing on with my Partitioning post series, this is part 7.

The partitioning includes several major components of work (and can be linked below);

  1. partitioning large existing non-partitioned tables
  2. measuring performance impacts of partitioned aligned indexes
  3. measuring performance impacts of DML triggers for enforcing partitioned unique indexes
  4. rebuilding tables that are already partitioned (ie. apply a new partitioning scheme)
  5. implementing partial backups and restores (via leveraging partitions)
  6. implementing partition aware index optimisation procedures
  7. Calculating table partition sizes in advance

This blog post deals with calculating partitioning sizes in advance.

Sometimes (just sometimes) you need to calculate the size your table partitions upfront before you actually go to the pain and effort of partitioning (or repartition) a table.  Doing this helps with pre-sizing the database files in advance instead of having them auto-grow many many times over in small increments as you cut data over into the partitions.

As a quick aside…

  • The negative performance impacts of auto-shrink are universally well known (er, for DBA’s that is!), however I rarely hear people talk about the less universally well known negative performance impacts of auto-grow quite so much.
  • Auto-Growing your database files in small increments can cause physical fragmentation in the database files on the storage subsystem and cause reduced IO performance.  If you are interested you can read about this here https://support.microsoft.com/en-us/kb/315512

Now – back to what I was saying about pre-sizing table partitions…!

I prepared a SQL script which given some parameters can review an existing table and its indexes (whether they are already partitioned or not) and tell you what your partition sizing breakdown would be should that table be partitioned with a given partition function.

I wrote it just for what I needed but it could be expanded more if you are feeling energetic.  The script is at the end of this post.

And so, lets get into the nitty gritty of this estimation script!

Continue reading

Implementing Partition Aware Index Optimisation Procedures

[read this post on Mr. Fox SQL blog]

Continuing on with my Partitioning post series, this is part 6.

The partitioning includes several major components of work (and can be linked below);

  1. partitioning large existing non-partitioned tables
  2. measuring performance impacts of partitioned aligned indexes
  3. measuring performance impacts of DML triggers for enforcing partitioned unique indexes
  4. rebuilding tables that are already partitioned (ie. apply a new partitioning scheme)
  5. implementing partial backups and restores (via leveraging partitions)
  6. implementing partition aware index optimisation procedures
  7. Calculating table partition sizes in advance

This blog post deals with implementing partition aware index optimisation procedures.

And so, lets get into the nitty gritty of the partitioning details!

Continue reading

PASS 2015 Session Report – Azure SQL DW Integration with the Azure Ecosystem & End of PASS 2015

PASS 2015 continues (and finishes up today!) in Seattle.

Its been an amazing conference this year with a few things really hitting home;

  • Amazing technology announcements around SQL 2016 CTP3
  • Incredible advances in almost every component in Azure Data Services
  • Full and seamless SQL/Azure ecosystem integration – and by that I mean both On-Prem and/or within the Azure Cloud.  The story of either On-Prem or Azure Cloud is compelling enough individually, however the Hybrid story is now a reality for SQL and enables dynamic and flexible architectures well beyond what competitors can offer.
  • BUT what astounds me the most is actually the pace of change – barely a day goes by where I don’t receive a new services or feature update related to SQL 2016 CTP3 or Azure.
  • I don’t recall a time (in recent memory) where the step changes have come so thick/fast – its certainly changed from where I started as a DBA on RDB/VMS back in 1994 where patches arrived by mail on tape cartridge! 🙂
  • (As a quick aside a chief designer on RDB was Jim Gray, the same who joined Microsoft to lead the SQL Server architecture to stardom soon after Oracle bought-out and shelved DEC around 1995+)

 

Enough reminiscing already – moving along – Today I attended 5 back-back sessions, and again I cannot blog about all of them in the time I have (or want to spend), but the one which stands out the most was Azure SQL Data Warehouse and Integration with the Azure Ecosystem by Drew DiPalma of Microsoft.

This session focused specifically on the Azure ecosystem surrounding the Azure SQL Data Warehouse (SQL DW) and how it can seamlessly interact with other Azure components to create different operational solutions.  To me this was very compelling, not necessarily due to the SQL DW technology (which I know well already as the on-prem APS appliance), but more-so as it showed just how easily all parts of Azure can happily work together.

Continue reading

PASS 2015 Session Report – Understanding Real World Big Data Scenarios

PASS 2015 continues in Seattle, and today was my session at 1045am on Using Azure Machine Learning (ML) to Predict Seattle House Prices.  The background and info on my session is here http://www.sqlpass.org/summit/2015/Sessions/Details.aspx?sid=7794

Overall I was pretty happy with how it went - and I think everyone who attended had a lot of fun with some of the games and tests I injected into the presentation.  Everyone had a chance to be a Real Estate Agent :) - and at the same time learn some great methods around performing Azure ML Regression Predictive Analytics.

 

BUT – moving right along – I also attended 3 other sessions today, again I cannot blog about all of them in the time I have, but the one which made me think the most about technology implementations and how they can improves lives was Understanding Real World Big Data Scenarios by Nishant Thacker of Microsoft.

It wasn’t about use cases for big data (as this is a horse already bolted), but more around really innovative and interesting ways the ecosystem of Azure technologies could be deployed to solve some complex business problems, or moreso simply ways to make our lives better!

Continue reading

PASS 2015 Session Report – KeyNote and SQL Server on Azure Virtual Machines: Features, Best Practices & Roadmap

So PASS officially kicked off this morning leading into the next 3 days of back to back sessions.

You could certainly tell that the keynote was on… I mean the dining room was pumping…!

WhereTheBloodyHellAreYou

 

 

Oh that’s right, everyone is at the keynote!

KeyNoteDay1

 

 

So the Keynote session was hosted by Joseph Sirosh Group Vice President, Data Group.

The big tell for the key note was undoubtedly the SQL Server 2016 CTP3 and just whats packed to the rafters within the software.  If you want to learn more about that then I recommended step across to this link here http://blogs.technet.com/b/dataplatforminsider/archive/2015/10/28/sql-server-2016-everything-built-in.aspx

Key Takeaways from the Keynote;

  • SQL 2016 is really a major release that really solidifies the Microsoft view of a solid foot in both the On Prem and In Cloud data platform camps.
  • “The future is both earth and sky!”
  • The release offers much On Prem capability like Polybase (to APS), R integration (advanced analytics), Always Encrypted, SSAS/SSRS improvements
  • The release also provides the ability to seamlessly integrate from On Prem to Azure Cloud – and/or back like Polybase (to HDInsight), Stretch Database – and SQL already has capability to use Azure VM’s for SQL AAG solutions and Azure backups.
  • An interesting takeaway – the human size of human genome is approx 1.5 Gigabytes, or about 2 CDs worth of storage space.  How small do you feel now?

 

I then attended 4 sessions, but today there is really only time to blog about this one, mostly for me it was the most impressive in regards to capability and just how far its come!

The session was SQL Server in Azure Virtual Machines – Features and Best Practices and was presented by Luis Vargas is a Senior Program Manager Lead in the SQL Server team.

Continue reading

PASS 2015 Session Report – Optimize “All Data” with a Modern Data Warehouse Solution

PASS 2015 has kicked off in Seattle, well the precon’s have anyway on Mon & Tue.  The actual conference starts on Wed-Fri!

I attended a precon session today called Optimize “All Data” with a Modern Data Warehouse Solution held by Bradley Ball and Josh Luedeman of Pragmatic Works.

The session had a focus on moderising the corporate data warehouse via focusing on Data Lifecycle Optimisation.

What does that mean?  

Well – It means focusing on a define set of critical technology and business areas around the corporate data warehouse and strategically implementing a managed approach to improving the corporate data warehouse via introduction of technologies and processes.  Specifically this looked at 6 areas around the corporate data warehouse to consider in your approach to modernisation;

  1. Architecture and Configuration
  2. Availability and Continuity
  3. Maintenance and Optimisation
  4. Enterprise BI
  5. Big Data Architecture and Deployment
  6. Business and Predictive Analytics

Continue reading

House Price Predictions @ PASS 2015

The PASS speakers were announced a quite a number of weeks ago now for PASS 2015 in October – and I felt pretty damn honored to be one of the 160 or so selected to speak at the conference (…and as a quick aside I also note a reasonably large contingent of Aussie & Kiwi presenters too!).

It will be my 2nd conference, and first time speaking at it.

Exciting?  Absolutely!  Nervous?  Absolutely!  

So what is my topic I hear you *think* – and why might it be interesting to you? (read on!)

Continue reading