Timeseries Analytics Capabilities, and Azure Data Explorer (ADX)

[read this post on Mr. Fox SQL blog]

I’ve had a few recent conversations where customers/partners were encountering scale concerns in existing timeseries database applications hosted outside of Azure, and wished to explore the native services in Azure to support their list of critical analytical requirements. After reviewing the list my mind immediately went to the awesome Azure Data Explorer (ADX).

Without going into detail about these scenarios – this blog post explores the types of core functionality that typical timeseries data processing applications seek, and how “out of the box” functionality built into ADX aligns extremely well to meet these challenges head-on.

So given the above, hopefully this breakdown of critical ADX capabilities helps you align your own timeseries database / application needs to Azure Data Explorer!

So what is timeseries anyway?

What exactly is timeseries data? (I hear you think)

Well, my good agnostic and totally unaffiliated friend Wikipedia says – “time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data“.

So it may look something like this… (ref: Wikipedia)

And then continues to say – “Time series are very frequently plotted via run charts (a temporal line chart). Time series are used in statistics, signal processing, pattern recognition, econometrics, mathematical finance, weather forecasting, earthquake prediction, electroencephalography, control engineering, astronomy, communications engineering, and largely in any domain of applied science and engineering which involves temporal measurements“

Oh boy, that’s a lot to take in on one sentence, are you still with me? OK, lets move on…

What is Azure Data Explorer (ADX)?

Azure Data Explorer is a VERY fast and VERY highly scalable data exploration service for log and telemetry data (which BTW is naturally timeseries in nature). It helps you handle the many data streams emitted by modern software, so you can collect, store, and analyze data.

Azure Data Explorer is ideal for analyzing large volumes of diverse data from any data source, such as websites, applications, IoT devices, and more. This data is used for diagnostics, monitoring, reporting, machine learning, and additional analytics capabilities.

Checkout the ADX MS Docs references here

Personally I’ve used Azure Data Explorer in several large scale timeseries related projects, and it has hands-down nailed it each and every time. In fact, don’t take it from me, check out one of our recent case studies here for a super cool and innovative Virtual Power Plant (VPP) solution built by AGL Energy here in Australia.

However, while it does just about every aspect of timeseries analytics extremely well, its not the type of service you aim to roll on out for every data problem you bump into, oh no!

For example; its not a fully featured relational database, its not a Spark engine, etc. You get what I mean. Like many well-shoed services in Azure its super efficient in what it does, but you need to be selective on the types of problems you are looking to solve using ADX, hence this features paper below.

What are the typical timeseries feature categories?

Okie Dokie, now we’ve done the backstory, what are the typical areas that timeseries data processing and analytics applications tend to look out for? These are outlined below…

• Statistical Functions

• Down Sampling (e.g CQ)

• Time-Stamped Measurement

• Data Summarization

• Data Lifecycle Management

• Data Back-Filling

• Support for Advanced Data Queries

• Ingestion Source; Internal to Azure

• Ingestion Source; External to Azure

• Data Concurrency and Consistency

• Data Compression

• Metadata Tagging

• Reference Data

• High Ingest Rate (write)

• High Query Rate (read)

• Linear Engine Scalability

• Encryption (in transit) – Transport Layer Security (TLS)

• Authentication and Authorization

• Encryption (at rest)

• Backup / Restore

• High Availability (HA) and Fault Tolerance (resiliency)

• Rolling Upgrades and Restarts

• Migration Tools

• Integrate with BI Visualization Tools / Dashboard Platforms

• Integrate with External Tools / Frameworks

• Management Tools – Administration

• Management Tools – Deployment Automation

• Monitoring / Audit / Logs

• Maintenance (SLA)

BTW – I’m not saying this list is finite per se, its just a nice summary of the common areas I’ve seen a few times.

To the assessment paper download!

If all of the above sounds of interest – then you can download the ADX Paper here

This whitepaper outlines 29 key functionality areas expected for typical timeseries data processing and analytics applications, and how these are met via the core capabilities provided within Azure Data Explorer (ADX) the massive scale PaaS timeseries database hosted in the Azure cloud.

Information presented in this whitepaper provides a reference view on core ADX “out of the box” functionality (as at writing) against a set of common timeseries analytics and applications requirements. All information provided has been referenced back to official MS Docs pages.

NOTE – this paper lists key functionality only, and does not provide performance benchmarks, or any form of industry product comparison.

And what the heck, while you’re at it, feel free to look through the rest of my downloadable presentations here

Conclusion

OK, so there you have it, a quick summary of the main ADX Functionality to store and process timeseries data assets.

By the way, if I’m missed something super obvious, or you think something needs to be changed, then ping me in the comments and I’ll check it out!

Adios for now… however as usual, and as I always say, review this yourself, as your own mileage may vary!

Disclaimer: all content on Mr. Fox SQL blog is subject to the disclaimer found here

Mr. Fox SQL

Outfoxing cloud data challenges by being rather damn sneaky…