The Real-Time Operational Analytics feature provides hosting service providers the opportunity to support a key business scenario:
This document describes the following scenarios:
These are the focus of this document.
The audience for concepts explained in this document are the Infrastructure Consultants, Architects and Database Administrators of hosting service providers. Hosting service providers who are considering utilizing their in-house hardware and software investments towards the opportunity to increase bottom line revenue. Along with in-house hardware Hosting service providers can also utilize the Microsoft Hybrid Cloud Platform in incremental stages to offer ancillary services around Real-Time Operational Analytics to benefit their consumers.
The Reference Architecture gives an overview of how Real-Time Operational Analytics technology fits in the overall reference architecture of a service provider. At a high level we can visualize this as three layered architecture which includes Infrastructure, Platform and Application. Each layer has its own importance and association.
The components within infrastructure layer represents the technical capabilities of service provider which are required to host on-premise platform, software and applications (if any). The infrastructure layer of a service provider facilitates IaaS services to consumers to host in-memory columnstore enabled databases on virtualized environment as well as in the form of a dedicated physical hardware, in either scenario the compute power - physical memory and compute cores required to manage in-memory workload are dedicated and can be scaled up or scaled down on the need basis. The network component of this layer represent the physical network switches, routers, firewalls etc. and is used to establish two-way communication between infrastructure, software, platform and application.
Platform layer works in conjunction with infrastructure and application to provide database service, this service offers hosting of both in-memory enabled and disk based databases. Consumers' databases get hosted on this platform and then they can run both analytics and OLTP workloads on the same database tables at the same time by defining non-clustered columnstore index on specific tables based on requirement thereby eliminating the need for ETL and a data warehouse and so any data latency. Enabling Real-Time Operational analytics in a given database is very straight forward, one key thing which we have to be keep in consideration is creating a Filtered non-clustered columnstore index only on the warm or slowly changing data.
Application layer works in conjunction with infrastructure and platform and provide data to end users for their analysis. There could be a variety of applications which might want to gain access to data from a Real-Time Operational Analytics database and there are certain considerations one has to keep in view when connecting to Real-Time Operational Analytics database.
Accessing Real-Time Operational Analytics database from a desktop application is possible and there are no special changes or considerations one has to make. It is seamless to access Real-Time Operational Analytics database from a desktop because under the cover everything is taken care at the database platform side and from an application perspective it is as good as connecting to any standard SQL database. This means be it a .NET thick client or a Java or any other application as long as they can connect to the databases they would be able to leverage Real-Time Operational Analytics database features.
Accessing Real-Time Operational Analytics database from mobile devices such as phones and tablets is seamless and transparent. Mobile apps can connect to Real-Time Operational Analytics databases and leverage this technology. This holds good for different mobile platforms like iOS, Android and Windows.
A Mobile device such as phone and tablet can connect to a Real-Time Operational Analytics database using APIs or a direct mode in either cases from development point of view there is no change as in-memory Real-Time Operational Analytics technology works at server side. However high performance of the system becomes instantly visible based on query response time.
Accessing Real-Time Operational Analytics database from Web apps is easy and seamless just like accessing any other database in SQL Server. Web application developed on variety of platform like .NET as well as third party platform like Java, Python, PHP etc., can leverage Real-Time Operational Analytics database seamlessly because this technology works at server side, this means web developers from any platform need not to worry about any changes into their SQL Server database access method. Same hold true for web based reporting applications, these can connect to Real-Time Operational Analytics database without any change and can leverage high performance response back from the database.
This architecture provides an overview of the SQL Server engine with the OLTP and analytics components for Real time operational analytics.
Real-time analytics uses an updateable columnstore index on a rowstore table. The columnstore index maintains a copy of the data, so the OLTP and analytics workloads run against separate copies of the data. This minimizes the performance impact of both workloads running at the same time. SQL Server automatically maintains index changes so OLTP changes are always up-to-date for analytics. With this design, it is possible and practical to run analytics in real-time on up-to-date data. This works for both disk-based and memory-optimized tables. The image below shows one possible configuration using Analysis Server in Direct Query mode, but you can use any analytics tool or custom solution at your disposal. This solution addresses the drawbacks mentioned above since the analysis happens directly on the operational data.
The schema for OLTP databases is highly normalized (i.e. with minimal data duplication) which when used for analytic queries could lead to poor performance primarily because of complexity of joins between larger number of tables. The significant query speed up made possible by columnstore index (i.e. NCCI) can overcome the complexity of query and still deliver most analytics queries in few seconds. At this point, it is important to emphasize the analytics query performance with real-time operational analytics will not be as fast as you can get with a dedicated DW but the key benefit is the ability to do analytics in real-time.
With the introduction of non-clustered columnstore technology, customers are seeing dramatic (up to 100x) speed up in the performance of analytics queries and are increasingly looking for the ability to do analytics in real-time because implementing ETL adds a time delay and so the analytics queries will run on old data. For many businesses this delay is unacceptable because the business depends on analyzing data in real-time. So, organizations are transitioning their various applications to implement Real-Time Operational Analytics. Again, hosting service providers should be the direct benefactors.
Hosting service providers can leverage the innovative solutions discussed in this document by running both analytics and OLTP workloads on the same database tables at the same time and thereby eliminating the need for ETL and a data warehouse and so any data latency. With a combination of hybrid architecture, a service provider can not only offer to run these mission critical workloads in their environment but they can also offer enterprise grade security and disaster recovery. The innovation is really Microsoft constructing, proving, and documenting how the Microsoft Real-Time Operational Analytics database with a hybrid architecture meets and exceed hosting service provider needs.
A high-performance Real-Time Analytics scenario is one where service provider can offer high performance analytics in real-time on big OLTP or operational databases with minimum or no data latency, for such a scenario there are many parameters which must be considered from an in-memory reference architecture point of view like:
Real-Time Operational Analytics works both on disk-based and memory-optimized tables so that you can leverage it without any changes to your applications.
Real-Time Operational Analytics database uses an updateable columnstore index that speeds up and increases the performance of analytic queries. Following are the design considerations for a columnstore Index:
Filtered non-clustered columnstore index gives consumers the ability to create NCCI, meeting a filtered condition to improve the query performance by minimizing the overhead of maintaining a non-clustered columnstore index.
Figure: Accessing Hot data from Btree Index, warm data from columnstore
To make the right business and operational decisions about your data, it's important to determine the temperature of data based on frequency of access:
Below are the considerations and best practices for improving performance using Filtered Clustered Index by defining it only on warm data:
There are many workloads where there is no natural column to identify 'hot' rows and for this reason they can't benefit from filtered NCCI. For that cases, compression delay gives you an option to keep the 'active' rows in delta rowgroup and only transition these rows to compressed rowgroup after a specified delay. By keeping rows in delta rowgroups longer, you can reduce overall maintenance overhead of NCCI as described below:
Reducing Index fragmentation by keeping hot data in delta rowgroups using Compression_delay
Tables with columnstore index may get significantly fragmented (i.e. deleted rows) if the workload updates/deletes rows that have been compressed. A fragmented columnstore index leads to inefficient utilization of memory/storage. Besides inefficient use of resources, it also negatively impacts the analytics query performance because of extra IO and the need to filter the deleted rows from the result set.
You can minimize index fragmentation using COMPRESSION_DELAY option.
Accessing data in multi-dimensional way from Real-Time Operational Analytics database is simple. We can query data in a multi-dimensional way by using Direct Query mode, Real-Time Operational Analytics technology helps in aggregating and returning data to multi-dimensional layer in a robust way and this in turns improves the experience of performing analytics on a real-time operational database.
From an application perspective accessing any Real-Time Operational Analytics database is as good as you are accessing any other regular SQL database. This means technically you can run Real-Time Operational Analytics without making any changes to your application by just enabling a columnstore index with memory optimized or disk based tables. This brings are lot of relief to service provider as well as consumer.
Even though you can minimize the columnstore index maintenance by using a filtered columnstore index, analytics queries still require significant computing resources (CPU, IO, memory) which impact the operational workload performance. For most mission critical workloads, it is recommended to use the AlwaysOn configuration. In this configuration, you can eliminate the impact of running analytics by offloading it to a readable secondary.
If you run your application in High Availability configurations using AlwaysOn, using this configuration, you can offload your analytics to one or more Readable Secondary as shown in the figure below. In this configuration, the only analytics overhead at primary replica is the additional NCCI.
With SQL Server 2016 Consumer's application can be fast as well as fully secured as it supports TDE (Transparent Data Encryption) with which memory-optimized tables data on disk will be encrypted. TDE can be enabled on a Real-Time Operational Analytics database.
This allows for greater security along with the performance enhancements provided by memory-optimization.
NOTE: Encrypted data compresses significantly less than equivalent unencrypted data. If TDE is used to encrypt a database, backup compression will not be able to significantly compress the backup storage. Therefore, using TDE and backup compression together is not recommended.
Real-Time Operational Analytics databases are backed up as part of regular database backups so you don't need to do anything special to manage backups of databases. However, if there are any memory optimized tables in database then the backup will be much larger than compared to actual memory footprint. A full backup of a database with one or more memory-optimized tables consists of the allocated storage for disk-based tables (if any), the active transaction log, and the data and delta file pairs (also known as checkpoint file pairs) for memory-optimized tables. We can also do differential backups of disk based databases or databases with memory optimized tables and transactional backups of any disk based Real-Time Operational Analytics database.
In the similar way the restore of Real-Time Operational Analytics databases is similar to any regular database restore, only addition to this is that if there are any memory optimized tables in the database then they must be loaded into memory before database is available for user access.
In this document we studied how Real-Time Operational Analytics speed up the performance of analytics queries by working directly on operational workload and eliminating the need for ETL and a data warehouse. Also, it works both on disk-based and memory-optimized tables so that you can leverage it without any changes to your applications. Furthermore, Real-time analytics uses an updateable columnstore index that speeds up and increases the performance of analytic queries. We have also seen that how filtered non-clustered columnstore index improve the query performance by minimizing the overhead of maintaining a non-clustered columnstore index and at the same time how to eliminate the impact of running analytics by offloading it to a readable secondary.