Azure Synapse Analytics

Let’s start from the basics. When do we need a data warehouse? In general, you need a data warehouse when you want to keep historical data separate from transactional data. That is, when you want to keep data analytics separate from data usage in daily operations. The analytical and operational scenarios put very different demands on your database solution. With data analytics you are analyzing large amounts of data. In daily operations, you are performing a lot of reads and writes on your transactional data.

Next question: when do we need a cloud data warehouse? A cloud data warehouse has the following advantages:

  • Limitless scale
  • Useful insights via Power BI or Azure Machine Learning.
  • Unified experience. End-to-end analytics solution. All people that work with data, can work in the same environment with the same data.
  • Code free. 85 native connectors and a visual interface. Still able to use code (T-SQL, Python, Spark and .Net).
  • Data securtity in the form of automated threat detection, data encryption, fine-grained access control (row-level and column level security).
  • Enterprise Data Warehousing.

Note that a number of the above advantages are specific to Azure Synapse Analytics. Notably the unified experience that you get. Azure Synapse Analytics is an end-to-end data analytics solution.

Let’s elaborate a bit futher on the cloud data warehouse or Azure SQL Data Warehouse. Key in a Azure SQL Data Warehouse is that you keep the compute nodes separate from the storage nodes. By separating compute from storage, you can independently size compute power, regardless of storage needs. You can grow or shrink compute power without moving data. You can also pause compute capacity while leaving data intact. That way you only pay for storage outside operational hours and resume compute capacity during operational hours. Remember: storage is cheaper than compute power.

Finally, let’s move to Azure Synapse Analytics. Azure Synapse Analytics combines a number of different technologies. This is illustrated by the following two pictures:

You can create an Azure Synapse Analytics Workspace via the Azure Portal. The workspace name should be globally unique. To create a workspace you need an Azure Data Lake account and a file system. On the Security tab, you can specify a SQL Server admin account that will be the admin on all databases created within the Azure Synapse Analytics workspace. After creating the workspace, you can open Synapse Studio from the Portal. Go the Overview page of your Azure Synapse Analytics workspace.

A very important concept within Azure Synapse Analytics is Massive Parallel Processing (MPP). That means: running multiple compute nodes simultaneously on multiple storage nodes. The parallel queries are coordinated by a so-called control node. Remember: Your queries are typically select queries, not insert or update queries. When SQL Data Warehouse runs a query, the work is divided into 60 smaller queries that run in parallel. Each compute node manages one or more of these parallel queries. A Data Movement Service (DMS) is used to bring the results of these 60 queries back to one result set. You can scale compute power (CPU, memoy, IO) via Data Warehouse Units (DWU).

You can also divide up work by using table partitions. Table partitions enable you to divide your data in smaller groups of data. So, you go from a very large amount of data to smaller chunks of data. You can improve the efficiency and performance of loading data by just loading the data you need. You can also improve query performance. Let’s way you partition your data based on the data entry date or the transaction date. That way you can perform your queries on a subset of your data only.