Close

Power BI Data Flow

Power BI Data Flow is different from an Azure Data Factory Data Flow, but what is it?

Let’s start with a regular Power BI process. Here you run so-called power queries that feed data into Power BI datasets. In other words, the output of a power query will be loaded into in-memory storage. The power query is executed as part of your PowerBI Desktop process, which makes your Power BI solution slower.

After discussing with some BI experts, I learned the following:

  • A PowerBI dataset can contain multiple tables. As an example you can have a base table with orders with multiple mapping tables for let’s say customers and products. All these tables can be in one dataset.
  • A PowerBI dataset is not stored in-memory. It’s stored in a Power BI workspace. That means: one Power BI dataset can be used by multiple Power BI reports. You can assign users specific rights on a workspace or on a dataset within that workspace. As an example: you can give financial users explicit rights to export the output data set of a data flow to csv.

Power BI Dataflow is a power query process that runs in the cloud independently from any Power BI reports.

With a Power BI Dataflow, output of the power query is stored in intermediate storage, that is a csv file in Azure Data Lake storage. If you don’t have an Azure subscription, you can create a Power BI Pro (or Premium) account and have 10Gb of storage space included -> that is Azure Data Lake internal usage. Next you can use Azure Data Flow as the data source to get your data from. That means you will only have to run the Power Query once and store the output in a csv file.

Benefits of using Power BI Data Flow

  • Run once, use multiple times. You will only have to create the Power Query Table once and use it in multiple reports. Examples of reusable query tables: Products, Customers. Note that you can get data from multiple power query tables in one pbix file/report. 
  • Power queries that must be run at different schedules.
  • Create a Data Warehouse in Aure Data Lake when you don’t use a centralized data warehouse.

A Power BI Data Flow cannot be created in Power BI Desktop. It’s only available in the Power BI website. Also, you need to be in an organizational workspace, not in the default My Workspace. After you create a data flow, you will see the following screen:

From this screen you can select [Add new tables] which brings you to a screen where you can select an Excel/CSV file, JSON file, Sharepoint library, SQL Server database, Azure Blob or whatever you like. Use the on-prem data gateway if you want to use data from your on-premises network. Next you can select data, multiple tables if you like, and finally save the Data Flow. The connection information and power query will be created behind the scenes. On the Data Flow overview page, you can configure a scheduled refresh.