Close

Modern Data Warehouse Architecture

Sources PluralSight:

  • Building a Modern Data Architecture for the Data Driven Enterprise (Dec 2019)
  • Modern Data Warehousing at Scale using Data Factory (Dec 2019)

Traditional business intelligence (BI) has gone through fundamental changes. Let’s look at a classic Data Warehouse Architecture first:

There are many comments to be made about the classic Data Warehouse Architecture:

  • All data go through a complicated ETL proces (Extraction-Transformation-Load). This might be alright a few years ago, but nowadays there are many more data sources. Think of all kinds of streaming data that need to be processed and analyzed instantly. Think of big data.
  • For analysis, you could create cubes via SQL Server Analysis Services (SSAS). First all data had to be loaded into the data warehouse. Next the cube could be recalculated. The calculation of cubes typically took eight to ten hours and was often performed during the night. Needless to say, analysis was a very complex process with much latency and involved (typically 1 day).
  • There’s one important problem that’s not really solved. We need master data about entities like customers, products and suppliers. These data (or facts in a star schema) are typically stored in different primary systems and are not centrally managed.

With the rise of new technology and the cloud, a lot of processes and solutions are relabeled or added. That’s why it’s worth looking at the Modern Data Warehouse and the role integration specialists can play in building a so-called Data Driven Enterprise.

First of all let’s look at a modern Data Warehouse Architecture:

Remarks:

  • Instead of the traditional Extract-Transform-Load process, we now see a process with the following steps: Ingest, Prepare, Transform&Enrich, Serve.
  • Operational data can be directly analyzed to get real-time insights. The idea is to predict instead of look back. Simutaneously and with a relaxed time constraint, you can feed the data into the data warehouse.
  • The Data Storage Area is relabeled to Azure Data Lake.
  • SQL Server Integration Services (SSIS) is replaced by Azure Data Factory.
  • Spark on Azure Data Bricks or Hadoop on HDInsight must be positioned into the data preparation phase.
  • Data is not only served in a data warehouse, but can also be stored in a No SQL store like CosmosDB, if that makes the display/analysis easier. A new kid on the block is Azure Synapse Analytics.

Below is a slightly different view of a modern Data Warehouse Architecture:

  • At the top we see Master Data Management. MDM can come in many different flavors and is a topic in itself. In the picture we see master data coming in from the staging area into MDM. What we don’t see, is that master data are first send to the MDM store and next fed into the data warehouse. The data warehouse is just another destination for MDM, just like the primary systems are.
  • At the bottom we see streaming data coming in and being analyzed in real-time. This processing flow is an important new addition. Performed via Spark on Azure Data Bricks. Note that the data are also fed into the data warehouse. Below is another picture showing real-time data analysis before or even without data storage.

A final point to be made about data or data architecture is that there are many data stores to choose from. Some data stores are optimized for transaction volume (left hand), some data stores are optimized for analytic complexity (right hand).

Remarks:

  • Analytical means real-time analysis.
  • Azure Table storage is also part of the column family databases.
  • You can run into data integrity issues without a RDBMS schema.