All about Prepare data for analysis with Power BI (PL-300)

Organizations often export and store data in files. One possible file format is a flat file.
A flat file is a type of file that has only one data table and every row of data is in the same structure. The file doesn’t contain hierarchies. Likely, you’re familiar with the most common types of flat files, which are comma-separated values (.csv) files, delimited text (.txt) files, and fixed-width files.
Your Excel files might exist in one of the following locations:
- Local – You can import data from a local file into Power BI. The file isn’t moved into Power BI, and a link doesn’t remain to it. Instead, a new semantic model is created in Power BI, and data from the Excel file is loaded into it. Accordingly, changes to the original Excel file aren’t reflected in your Power BI semantic model. You can use local data import for data that doesn’t change.
- OneDrive for Business – You can pull data from OneDrive for Business into Power BI. This method is effective in keeping an Excel file and your semantic model, reports, and dashboards in Power BI synchronized. Power BI connects regularly to your file on OneDrive. If any changes are found, your semantic model, reports, and dashboards are automatically updated in Power BI.
- OneDrive – Personal – You can use data from files on a personal OneDrive account, and get many of the same benefits that you would with OneDrive for Business. However, you’ll need to sign in with your personal OneDrive account, and select the Keep me signed in option. Check with your system administrator to determine whether this type of connection is allowed in your organization.
- SharePoint – Team Sites – Saving your Power BI Desktop files to SharePoint Team Sites is similar to saving to OneDrive for Business. The main difference is how you connect to the file from Power BI. You can specify a URL or connect to the root folder.
Using a cloud option such as OneDrive or SharePoint Team Sites is the most effective way to keep your file and your semantic model, reports, and dashboards in Power BI in-sync. However, if your data doesn’t change regularly, saving files on a local computer is a suitable option.
You might have to change the location of a source file for a data source during development, or if a file storage location changes. To keep your reports up to date, you’ll need to update your file connection paths in Power BI.
Power Query provides many ways for you to accomplish this task, so that you can make this type of change when needed.
- Data source settings
- Query settings
- Advanced Editor
If you are changing a file path, make sure that you reconnect to the same file with the same file structure. Any structural changes to a file, such as deleting or renaming columns in the source file, will break the reporting model.
- Select Data source settings in Power Query.
- In the Data source settings window, select your file and then select Change Source.
- Update the File path or use the Browse option to locate your file, select OK, and then select Close.
If your organization uses a relational database for sales, you can use Power BI Desktop to connect directly to the database instead of using exported flat files.
- Get data from the SQL Server
- Next to the Get Data button are quick access data source options, such as SQL Server.
- Your next step is to enter your database server name and a database name in the SQL Server database window.
- The two options in data connectivity mode are: Import (selected by default, recommended) and DirectQuery. Mostly, you select Import.
- Other advanced options are also available in the SQL Server database window, but you can ignore them for now.
- You’ll be prompted to sign in with a username and password. You’ll have three sign-in options:
- Windows – Use your Windows account (Azure Active Directory credentials).
- Database – Use your database credentials. For instance, SQL Server has its own sign-in and authentication system that is sometimes used. If the database administrator gave you a unique sign-in to the database, you might need to enter those credentials on the Database tab.
- Microsoft account – Use your Microsoft account credentials. This option is often used for Azure services.
- Select a sign-in option, enter your username and password, and then select Connect.
- Select the check box(es) of the table(s) that you want to bring in to Power BI Desktop, and then select either the Load or Transform Data option.
- Load – Automatically load your data into a Power BI model in its current state.
- Transform Data – Open your data in Microsoft Power Query, where you can perform actions such as deleting unnecessary rows or columns, grouping your data, removing errors, and many other data quality tasks.
Another way you can import data is to write an SQL query to specify only the tables and columns that you need.
To write your SQL query, on the SQL Server database window, enter your server and database names, and then select the arrow next to Advanced options to expand this section and view your options.
In the SQL statement box, write your query statement, and then select OK. In this example, you’ll use the Select SQL statement to load the ID, NAME and SALESAMOUNT columns from the SALES table.
After you create a data source connection and load data into Power BI Desktop, you can return and change your connection settings at any time.
This action is often required due to a security policy within the organization, for example, when the password needs to be updated every 90 days. You can change the data source, edit permissions or clear permissions.
- On the Home tab, select Transform data, and then select the Data source settings option.
- From the list of data sources that are displayed, select the data source that you want to update.
- Then, you can right-click that data source to view the available update options or you can use the update option buttons on the lower left of the window.
- Select the update option that you need, change the settings as required, and then apply your changes.
You can also change your data source settings from within Power Query.
- Select the table, and then select the Data source settings option on the Home ribbon
- You can go to the Query Settings panel on the right side of the screen and select the settings icon next to Source (or double Select Source).
- In the window that displays, update the server and database details, and then select OK.
SQL stands for Structured Query Language and is a standardized programming language that is used to manage relational databases and perform various data management operations.
Consider the scenario where your database has a large table that is comprised of sales data over several years. Sales data from 2009 isn’t relevant to the report that you’re creating.
This situation is where SQL is beneficial because it allows you to load only the required set of data by specifying exact columns and rows in your SQL statement and then importing them into your semantic model. You can also join different tables, run specific calculations, create logical statements, and filter data in your SQL query.
- The SQL query starts with a Select statement, which allows you to choose the specific fields that you want to pull from your database. In this example, you want to load the ID, NAME, and SALESAMOUNT columns.
- FROM specifies the name of the table that you want to pull the data from. In this case, it’s the SALES table.
- When using an SQL query to import data, try to avoid using the wildcard character (*) in your query. If you use the wildcard character (*) in your SELECT statement, you import all columns that you don’t need from the specified table.
- The wildcard character (*) will import all columns within the Sales table. This method isn’t recommended because it will lead to redundant data in your semantic model, which will cause performance issues and require extra steps to normalize your data for reporting.
- All queries should also have a WHERE clause. This clause will filter the rows to pick only filtered records that you want. In this example, if you want to get recent sales data after January 1st, 2020, add a WHERE clause. The evolved query would look like the following example.
It’s a best practice to avoid doing this directly in Power BI. Instead, consider writing a query like this in a view.
A view is an object in a relational database, similar to a table. Views have rows and columns, and can contain almost every operator in the SQL language.
If Power BI uses a view, when it retrieves data, it participates in query folding, a feature of Power Query.
Query folding will be explained later, but in short, Power Query will optimize data retrieval according to how the data is being used later.
Dynamic reports are reports in which the data can be changed by a developer according to user specifications.
- Dynamic reports are valuable because a single report can be used for multiple purposes.
- If you use dynamic reports, you’ll have fewer individual reports to create, which will save organizational time and resources
- You can use parameters by determining the values that you want to see data for in the report, and the report updates accordingly by filtering the data for you.
- Creating dynamic reports allows you to give users more power over the data that is displayed in your reports; they can change the data source and filter the data by themselves.
Create dynamic reports for individual values – follow the below steps
- To create a dynamic report, you first need to write your SQL query.
- Then use the Get data feature in Power BI Desktop to connect to the database.
- After you have entered your server details, in the SQL Server database window, select Advanced options.
- Paste the SQL query into the SQL statement box and then select OK.
- When the connection is made, the data is shown in the preview window.
- Select Edit to open the data in Power Query Editor.
- Next, you create the parameter by following these steps:
- On the Home tab, select Manage parameters > New parameter.
- On the Parameters window, change the default parameter name to something more descriptive so that its purpose is clear. In this case, you change the name to SalesPerson.
- Select Text from the Type list and then select Any value from the Suggested value list.
- Select OK.
- A new query is shown for the parameter that you created.
- Now, you need to adjust the code in SQL query to assess your new parameter:
- Right-click Query1 and then select Advanced editor.
- Replace the existing value in the execute statement with an ampersand (&) followed by your parameter name (SalesPerson), as illustrated in the following image.
- Make sure that no errors are shown at bottom of the window and then select Done.
- Though you don’t see a difference on the screen, Power BI ran the query.
- To confirm that the query was run, you can run a test by selecting the parameter query and entering a new value in the Current Value box.
- A warning icon might display next to the query. If so, select that query to view the warning message, which states that permission is required to run this native database query. Select Edit Permission and then select Run.
- When the query runs successfully, the parameter displays the new value.
- Select Close and Apply to return to the report editor.
- Now, you can apply the parameter to the report:
- Select Edit queries > Edit parameters.
- On the Edit Parameters window, enter a new value and then select OK.
- Select Apply changes and then run the native query again.Now, when you view the data, you see the data for the new value that was passed through the parameter.
- You can now create a report that displays data for one particular value at a time. More steps are needed to display data for multiple values at the same time.
Create dynamic reports for multiple values – Follow the below steps
- To accommodate multiple values at a time, you first need to create a Microsoft Excel worksheet that has a table consisting of one column that contains the list of values.
- Next, use the Get data feature in Power BI Desktop to connect to the data in that Excel worksheet, and then follow these steps:
- On the Navigator window, select Edit to open the data in Power Query Editor, where you see a new query for the data table.
- Rename the column in the table to something more descriptive.
- Change the column data type to Text so that it matches the parameter type and you avoid data conversion problems.
- In the query Properties section, change the name of the data source to something more descriptive. For this example, enter SalesPersonID.
- Next, you need to create a function that passes the new SalesPersonID query into Query1:
- Right-click Query1 and then select Create function.
- Enter a name for the function and then select OK.
- Your new function appears in the Queries pane.
- To ensure that Query1 doesn’t show up in the field list for the report, which could potentially confuse users, you can disable it from loading in the report by right-clicking Query1 again and then selecting Enable load (selected by default) to disable the feature.
- Select the SalesPersonID query that you loaded from the Excel worksheet and then, on the Add Column tab, select Invoke custom function to run the custom function that you created.
- On the Invoke Custom Function window, select your function from the Function query list.
- The New column name updates automatically and the table that contains the values that you’re going to pass through the parameter is selected by default.
- Select OK and, if necessary, run the native query.
- A new column for your GetSalesFromSalesPerson function appears next to the SalesPersonID column.
- Select the two arrows icon in the new column header and then select the check boxes of the columns that you want to load. This section is where you determine the details that are available in the report for each value (sales person ID).
- Clear the Use original column name as prefix check box at the bottom of the screen because you don’t need to see a prefix with the column names in the report.
- Select OK.
- You should be able to view the data for the columns that you selected, for each value (sales person ID).
- If necessary, you can add more values (sales people IDs) to the SalesPersonID column in the Excel worksheet, or you can change the existing values.
- Save your changes and then return to Power Query Editor.
- On the Home tab, select Refresh Preview, and then run the native query again (if necessary). You should see the sales from the new sales people IDs that you added into the worksheet.
- Select Close and Apply to return to the report editor, where you see the new column names in the Fields pane.
- Now, you can start building your report.
Get data from a NoSQL database – follow the below steps
- Some organizations don’t use a relational database but instead use a NoSQL database.
- A NoSQL database (also referred to as non-SQL, not only SQL or non-relational) is a flexible type of database that doesn’t use tables to store data.
- The application uses Cosmos DB, a NoSQL database, as the data repository.
- Data is stored as JSON documents, which are open standard file formats that are primarily used to transmit data between a server and web application.
- You need to import this data into a Power BI semantic model for reporting.
- In this scenario, you’ll use the Get data feature in Power BI Desktop. However, this time you’ll select the More… option to locate and connect to the type of database that you use. In this example, you’ll select the Azure category, select Azure Cosmos DB, and then select Connect.
- On the Preview Connector window, select Continue and then enter your database credentials. In this example, on the Azure Cosmos DB window, you can enter the database details.
- You can specify the Azure Cosmos DB account endpoint URL that you want to get the data from (you can get the URL from the Keys blade of your Azure portal).
- Alternatively, you can enter the database name, collection name or use the navigator to select the database and collection to identify the data source.
- If you’re connecting to an endpoint for the first time, as you are in this example, make sure that you enter your account key. You can find this key in the Primary Key box in the Read-only Keys blade of your Azure portal.
Import a JSON file – follow the below steps
- If you’re working with data stored in JSON format, it’s often necessary to extract and normalize the data first.
- This is because JSON data is often stored in a nested or unstructured format, which makes it difficult to analyze or report on directly.
- After you’ve connected to the database account, the Navigator window opens, showing a list of databases under that account.
- Select the table that you want to import. In this example, you’ll select the Product table.
- The preview pane only shows Record items because all records in the document are represented as a Record type in Power BI.
- Select the Edit button to open the records in Power Query.
- In Power Query, select the Expander button to the right side of the Column1 header, which displays the context menu with a list of fields.
- Select the fields that you want to load into Power BI Desktop, clear the Use original column name as prefix checkbox, and then select OK.
- Review the selected data to ensure that you’re satisfied with it, then select Close & Apply to load the data into Power BI Desktop.
- The data now resembles a table with rows and columns. Data from Cosmos DB can now be related to data from other data sources and can eventually be used in a Power BI report.
Get data from online services – follow the below steps
- To support their daily operations, organizations frequently use a range of software applications, such as SharePoint, OneDrive, Dynamics 365, Google Analytics and so on.
- These applications produce their own data. Power BI can combine the data from multiple applications to produce more meaningful insights and reports.
- Connect to data from sharepoint list – follow the below steps
- you select SharePoint Online List.
- After you’ve selected Connect, you’ll be asked for your SharePoint URL.
- This URL is the one that you use to sign into your SharePoint site through a web browser.
- You can copy the URL from your SharePoint site and paste it into the connection window in Power BI.
- You don’t need to enter your full URL file path; you only need to load your site URL because, when you’re connected, you can select the specific list that you want to load.
- Depending on the URL that you copied, you might need to delete the last part of your URL, as illustrated in the following image.
- After you’ve entered your URL, select OK. Power BI needs to authorize the connection to SharePoint, so sign in with your Microsoft account and then select Connect.
- Choose the application data to import. After Power BI has made the connection with SharePoint, the Navigator window appears, as it does when you connect to other data sources.
- The window displays the tables and entities within your SharePoint site. Select the list that you want to load into Power BI Desktop.
- Similar to when you import from other data sources, you have the option to automatically load your data into a Power BI model or launch the Power Query Editor to transform your data before loading it.
- The first task you need to accomplish is to create your semantic models in Power BI so you can build visuals and other report elements.
The Sales department has many different semantic models of varying sizes. For security reasons, you aren’t allowed to import local copies of the data into your reports, so directly importing data is no longer an option. Therefore, you need to create a direct connection to the Sales department’s data source.
The following section describes how you can ensure that these business requirements are satisfied when you’re importing data into Power BI.
However, sometimes there may be security requirements around your data that make it impossible to directly import a copy. Or your semantic models may simply be too large and would take too long to load into Power BI, and you want to avoid creating a performance bottleneck.
Power BI solves these problems by using the DirectQuery storage mode, which allows you to query the data in the data source directly and not import a copy into Power BI.
DirectQuery is useful because it ensures you’re always viewing the most recent version of the data.
The three different types of storage modes you can choose from:
- Import
- DirectQuery
- Dual (Composite)
You can access storage modes by switching to the Model view, selecting a data table, and in the resulting Properties pane, selecting which mode that you want to use from the Storage mode drop-down list, as shown in the following visual.
Import mode – The Import mode allows you to create a local Power BI copy of your semantic models from your data source. You can use all Power BI service features with this storage mode, including Q&A and Quick Insights. Data refreshes can be scheduled or on-demand. Import mode is the default for creating new Power BI reports.
DirectQuery mode – The DirectQuery option is useful when you don’t want to save local copies of your data because your data won’t be cached.
- Instead, you can query the specific tables that you’ll need by using native Power BI queries, and the required data will be retrieved from the underlying data source.
- Essentially, you’re creating a direct connection to the data source. Using this model ensures that you’re always viewing the most up-to-date data, and that all security requirements are satisfied.
- Additionally, this mode is suited for when you have large semantic models to pull data from.
- Instead of slowing down performance by having to load large amounts of data into Power BI, you can use DirectQuery to create a connection to the source, solving data latency issues as well.
Dual (Composite mode) – In Dual mode, you can identify some data to be directly imported and other data that must be queried.
- Any table that is brought into your report is a product of both Import and DirectQuery modes.
- Using the Dual mode allows Power BI to choose the most efficient form of data retrieval.
Get data from Azure Analysis Services – Azure Analysis Services is a fully managed platform as a service (PaaS) that provides enterprise-grade semantic models in the cloud.
You can use advanced mashup and modeling features to combine data from multiple data sources, define metrics, and secure your data in a single, trusted tabular semantic model.
The semantic model provides an easier and faster way for users to perform ad hoc data analysis using tools like Power BI.
To resume the scenario, Tailwind Traders uses Azure Analysis Services to store financial projection data. You’ve been asked to compare this data with actual sales data in a different database.
Getting data from Azure Analysis Services server is similar to getting data from SQL Server, in that you can:
- Authenticate to the server.
- Pick the model you want to use.
- Select which tables you need.
Notable differences between Azure Analysis Services and SQL Server are:
- Analysis Services models have calculations already created.
- If you don’t need an entire table, you can query the data directly.
- Instead of using Transact-SQL (T-SQL) to query the data, like you would in SQL Server, you can use multi-dimensional expressions (MDX) or data analysis expressions (DAX).
Optimize performance in Power Query – The performance in Power Query depends on the performance at the data source level.
The variety of data sources that Power Query offers is wide, and the performance tuning techniques for each source are equally wide.
For instance, if you extract data from a Microsoft SQL Server, you should follow the performance tuning guidelines for the product.
Good SQL Server performance tuning techniques include
- index creation,
- hardware upgrades,
- execution plan tuning, and
- data compression.
Power Query takes advantage of good performance at the data source through a technique called Query Folding.
Query folding – The query folding within Power Query Editor helps you increase the performance of your Power BI reports.
- Query folding is the process by which the transformations and edits that you make in Power Query Editor are simultaneously tracked as native queries, or simple Select SQL statements, while you’re actively making transformations.
The reason for implementing this process is to ensure that these transformations can take place in the original data source server and don’t overwhelm Power BI computing resources.
You can use Power Query to load data into Power BI. Then use Power Query Editor to transform your data, such as renaming or deleting columns, appending, parsing, filtering, or grouping your data.
- Consider a scenario where you’ve renamed a few columns in the Sales data and merged a city and state column together in the “city-state” format.
- Meanwhile, the query folding feature tracks those changes in native queries.
- Then, when you load your data, the transformations take place independently in the original source, this ensures that performance is optimized in Power BI.
The benefits of query folding include:
- More efficiency in data refreshes and incremental refreshes. When you import data tables by using query folding, Power BI is better able to allocate resources and refresh the data faster because Power BI doesn’t have to run through each transformation locally.
- Automatic compatibility with DirectQuery and Dual storage modes. All DirectQuery and Dual storage mode data sources must have the back-end server processing abilities to create a direct connection, which means that query folding is an automatic capability that you can use. If all transformations can be reduced to a single Select statement, then query folding can occur.
The following scenario shows query folding in action. In this scenario, you apply a set of queries to multiple tables.
- After you add a new data source by using Power Query, and you’re directed to the Power Query Editor, you go to the Query Settings pane and right-click the last applied step, as shown in the following figure.
- If the View Native Query option isn’t available (not displayed in bold type), then query folding isn’t possible for this step, and you’ll have to work backward in the Applied Steps area until you reach the step in which View Native Query is available (displays in bold type).
- This process will reveal the native query that is used to transform the semantic model.
Native queries aren’t possible for the following transformations:
- Adding an index column
- Merging and appending columns of different tables with two different sources
- Changing the data type of a column
A good guideline to remember is that if you can translate a transformation into a Select SQL statement, which includes operators and clauses such as GROUP BY, SORT BY, WHERE, UNION ALL, and JOIN, you can use query folding.
While query folding is one option to optimize performance when retrieving, importing, and preparing data, another option is query diagnostics.
Query diagnostics – Another tool that you can use to study query performance is query diagnostics.
You can determine what bottlenecks may exist while loading and transforming your data, refreshing your data in Power Query, running SQL statements in Query Editor, and so on.
- To access query diagnostics in Power Query Editor, go to Tools in the Home ribbon.
- When you’re ready to begin transforming your data or making other edits in Power Query Editor, select Start Diagnostics in the Session Diagnostics section.
- When you’re finished, make sure that you select Stop Diagnostics.
- Selecting Diagnose Step shows you the length of time that it takes to run that step.
- This selection can tell you if a step takes longer to complete than others, which then serves as a starting point for further investigation.
- This tool is useful when you want to analyze performance on the Power Query side for tasks such as loading semantic models, running data refreshes, or running other transformative tasks.
Other ways to optimize query performance in Power BI include:
- Process as much data as possible in the original data source. Power Query and Power Query Editor allow you to process the data; however, the processing power that is required to complete this task might lower performance in other areas of your reports. Generally, a good practice is to process, as much as possible, in the native data source.
- Use native SQL queries. When using DirectQuery for SQL databases, such as the case for our scenario, make sure that you aren’t pulling data from stored procedures or common table expressions (CTEs).
- Separate date and time, if bound together. If any of your tables have columns that combine date and time, make sure that you separate them into distinct columns before importing them into Power BI. This approach will increase compression abilities.
Query folding is the ability for a Power Query query to generate a single query statement that retrieves and transforms source data. For more information, see Power Query query folding.
For a DirectQuery or Dual storage mode table, the Power Query query must achieve query folding.
For an Import table, it may be possible to achieve query folding. When the query is based on a relational source—and if a single SELECT statement can be constructed—you achieve best data refresh performance by ensuring that query folding occurs.
If the Power Query mashup engine is still required to process transformations, you should strive to minimize the work it needs to do, especially for large semantic models (previously known as datasets).
The following bulleted-list provides specific guidance.
- Delegate as much processing to the data source as possible: When all steps of a Power Query query can’t be folded, discover the step that prevents query folding.
- When possible, move later steps earlier in sequence so they may be factored into the query folding.
- Note the Power Query mashup engine may be smart enough to reorder your query steps when it generates the source query.
For a relational data source, if the step that prevents query folding could be achieved in a single SELECT statement—or within the procedural logic of a stored procedure—consider using a native SQL query, as described next.
Use a native SQL query: When a Power Query query retrieves data from a relational source, some sources can use a native SQL query.
The query can in fact be any valid statement, including a stored procedure execution.
If the statement produces multiple result sets, only the first be returned. Parameters can be declared in the statement, and we recommend that you use the Value.NativeQuery M function.
This function was designed to safely and conveniently pass parameter values. It’s important to understand that the Power Query mashup engine can’t fold later query steps, and so you should include all—or as much—transformation logic in the native query statement.
There are two important considerations you need to bear in mind when using native SQL queries:
- A native SQL query can potentially do more than retrieve data. Any valid statement can be executed (and possibly multiple times), including one that modifies or deletes data.
- It’s important that you apply the principle of least privilege to ensure that the account used to access the database has only read permission on required data.
Prepare and transform data in the source: When you identify that certain Power Query query steps can’t be folded, it may be possible to apply the transformations in the data source.
- The transformations could be achieved by writing a database view that logically transforms source data. Or, by physically preparing and materializing data, in advance of Power BI querying it.
A relational data warehouse is an excellent example of prepared data, usually consisting of pre-integrated sources of organizational data.
Resolve data import errors – While importing data into Power BI, you may encounter errors resulting from factors such as:
- Power BI imports from numerous data sources.
- Each data source might have dozens (and sometimes hundreds) of different error messages.
- Other components can cause errors, such as hard drives, networks, software services, and operating systems.
- Data often can’t comply with any specific schema.
- Relational source systems often have many people who are concurrently using the same data in the same database.
- Some relational systems and their administrators seek to limit a user from monopolizing all hardware resources by setting a query timeout.
- These timeouts can be configured for any timespan, from as little as five seconds to as much as 30 minutes or more.
Power BI Query Error: Timeout expired – This error indicates that you’ve pulled too much data according to your organization’s policies.
- Administrators incorporate this policy to avoid slowing down a different application or suite of applications that might also be using that database.
- You can resolve this error by pulling fewer columns or rows from a single table.
- While you’re writing SQL statements, it might be a common practice to include groupings and aggregations.
- You can also join multiple tables in a single SQL statement.
- Additionally, you can perform complicated subqueries and nested queries in a single statement.
- These complexities add to the query processing requirements of the relational system and can greatly elongate the time of implementation.
- If you need the rows, columns, and complexity, consider taking small chunks of data and then bringing them back together by using Power Query.
- For instance, you can combine half the columns in one query and the other half in a different query. Power Query can merge those two queries back together after you’re finished.
We couldn’t find any data formatted as a table – Occasionally, you may encounter the “We couldn’t find any data formatted as a table” error while importing data from Microsoft Excel.
- Fortunately, this error is self-explanatory. Power BI expects to find data formatted as a table from Excel. The error even tells you the resolution. Perform the following steps to resolve the issue:
- Open your Excel workbook, and highlight the data that you want to import.
- Press the Ctrl-T keyboard shortcut. The first row will likely be your column headers.
- Verify that the column headers reflect how you want to name your columns. Then, try to import data from Excel again. This time, it should work.
Couldn’t find the file – While importing data from a file, you may get the “Couldn’t find file” error.
- Usually, this error is caused by the file moving locations or the permissions to the file changing. If the cause is the former, you need to find the file and change the source settings.
- Open Power Query by selecting the Transform Data button in Power BI.
- Highlight the query that is creating the error.
- On the left, under Query Settings, select the gear icon next to Source.
- Change the file location to the new location.
Data type errors – Sometimes, when you import data into Power BI, the columns appear blank. This situation happens because of an error in interpreting the data type in Power BI.
- The resolution to this error is unique to the data source. For instance, if you’re importing data from SQL Server and see blank columns, you could try to convert to the correct data type in the query.
- Instead of using this query:
- SELECT CustomerPostalCode FROM Sales.Customers
- Use this query:
- SELECT CAST(CustomerPostalCode as varchar(10)) FROM Sales.Customers
- By specifying the correct type at the data source, you eliminate many of these common data source errors.
- You may encounter different types of errors in Power BI that are caused by the diverse data source systems where your data resides.
When examining the data, you discover several issues, including:
- A column called Employment status only contains numerals.
- Several columns contain errors.
- Some columns contain null values.
- The customer ID in some columns appears as if it was duplicated repeatedly.
- A single address column has a combined street address, city, state, and zip code.
Clean data has the following advantages:
- Measures and columns produce more accurate results when they perform aggregations and calculations.
- Tables are organized, where users can find the data in an intuitive manner.
- Duplicates are removed, making data navigation simpler. It will also produce columns that can be used in slicers and filters.
- A complicated column can be split into two, simpler columns. Multiple columns can be combined into one column for readability.
- Codes and integers can be replaced with human readable values.
- In Power Query Editor, the right-click context menus and Transform tab in the ribbon provide many of the same options.
The first step in shaping your initial data is to identify the column headers and names within the data and then evaluate where they are located to ensure that they are in the right place.
- When a table is created in Power BI Desktop, Power Query Editor assumes that all data belongs in table rows.
- However, a data source might have a first row that contains column names, which is what happened in the previous SalesTarget example.
- To correct this inaccuracy, you need to promote the first table row into column headers.
You can promote headers in two ways: by selecting the Use First Row as Headers option on the Home tab or by selecting the drop-down button next to Column 1 and then selecting Use First Row as Headers.
The next step in shaping your data is to examine the column headers. You can rename column headers in two ways.
- One approach is to right-click the header, select Rename, edit the name, and then press Enter.
- Alternatively, you can double-click the column header and overwrite the name with the correct name.
- You can also work around this issue by removing (skipping) the first two rows and then renaming the columns to the correct name.
Remove top rows
- When shaping your data, you might need to remove some of the top rows, for example, if they are blank or if they contain data that you do not need in your reports.
- Continuing with the SalesTarget example, notice that the first row is blank (it has no data) and the second row has data that is no longer required.
Remove columns – A key step in the data shaping process is to remove unnecessary columns. It is much better to remove columns as early as possible.
- One way to remove columns would be to limit the column when you get data from the data source.
- For instance, if you are extracting data from a relational database by using SQL, you would want to limit the column that you extract by using a column list in the SELECT statement.
- Removing columns at an early stage in the process rather than later is best, especially when you have established relationships between your tables.
- Removing unnecessary columns will help you to focus on the data that you need and help improve the overall performance of your Power BI Desktop semantic models and reports.
- Examine each column and ask yourself if you really need the data that it contains. If you don’t plan on using that data in a report, the column adds no value to your semantic model.
- Therefore, the column should be removed. You can always add the column later, if your requirements change over time.
You can remove columns in two ways.
- The first method is to select the columns that you want to remove and then, on the Home tab, select Remove Columns.
- Alternatively, you can select the columns that you want to keep and then, on the Home tab, select Remove Columns > Remove Other Columns.
Unpivot columns – Unpivoting is a useful feature of Power BI.
- You can use this feature with data from any data source, but you would most often use it when importing data from Excel.
- Unpivoting streamlines the process of creating DAX measures on the data later.
- By completing this process, you have now created a simpler way of slicing the data with the Year and Month columns
- If the data that you are shaping is flat (in other words, it has lot of detail but is not organized or grouped in any way), the lack of structure can complicate your ability to identify patterns in the data.
You can use the Pivot Column feature to convert your flat data into a table that contains an aggregate value for each unique value in a column.
- For example, you might want to use this feature to summarize data by using different math functions such as Count, Minimum, Maximum, Median, Average, or Sum
- On the Transform tab, select Transform > Pivot Columns.
- On the Pivot Column window that displays, select a column from the Values Column list, such as Subcategory name.
- Expand the advanced options and select an option from the Aggregate Value Function list, such as Count (All), and then select OK.
- Power Query Editor records all steps that you take to shape your data, and the list of steps are shown in the Query Settings pane.
- If you have made all the required changes, select Close & Apply to close Power Query Editor and apply your changes to your semantic model.
Rename a query – It’s good practice to change uncommon or unhelpful query names to names that are more obvious or that the user is more familiar with.
- For instance, if you import a product fact table into Power BI Desktop and the query name displays as FactProductTable, you might want to change it to a more user-friendly name, such as Products.
- Similarly, if you import a view, the view might have a name that contains a prefix of v, such as vProduct.
- People might find this name unclear and confusing, so you might want to remove the prefix.
Replace values – You can use the Replace Values feature in Power Query Editor to replace any value with another value in a selected column.
- Select the column that contains the value that you want to replace (Attribute in this case), and then select Replace Values on the Transform tab.
- In the Value to Find box, enter the name of the value that you want to replace, and then in the Replace With box, enter the correct value name and then select OK.
- In Power Query, you can’t select one cell and change one value, like you might have done in Excel
Replace null values – Occasionally, you might find that your data sources contain null values.
- For example, a freight amount on a sales order might have a null value if it’s synonymous with zero. If the value stays null, the averages will not calculate correctly.
- One solution would be to change the nulls to zero, which will produce the more accurate freight average.
- In this instance, using the same steps that you followed previously will help you replace the null values with zero.
Remove duplicates – You can also remove duplicates from columns to only keep unique names in a selected column by using the Remove Duplicates feature in Power Query.
The Copy option is at the top of the context menu, as shown in the following screenshot.
- Copying the table before removing duplicates will give you a comparison of the tables and will let you use both tables, if needed.
Best practices for naming tables, columns, and values:
- Naming conventions for tables, columns, and values have no fixed rules; however, we recommend that you use the language and abbreviations that are commonly used within your organization and that everyone agrees on and considers them as common terminology.
- A best practice is to give your tables, columns and measures descriptive business terms and replace underscores (“_”) with spaces.
- Be consistent with abbreviations, prefixes, and words like “number” and “ID.”
- Excessively short abbreviations can cause confusion if they are not commonly used within the organization.
- Also, by removing prefixes or suffixes that you might use in table names and instead naming them in a simple format, you will help avoid confusion.
- When replacing values, try to imagine how those values will appear on the report.
- Values that are too long might be difficult to read and fit on a visual.
- Values that are too short might be difficult to interpret.
- Avoiding acronyms in values is also a good idea, provided that the text will fit on the visual.
Evaluate and change column data types – When you import a table from any data source, Power BI Desktop automatically starts scanning the first 1,000 rows (default setting) and tries to detect the type of data in the columns.
- Some situations might occur where Power BI Desktop doesn’t detect the correct data type. Where incorrect data types occur, you’ll experience performance issues.
- You have a higher chance of getting data type errors when you’re dealing with flat files, such as comma-separated values (.CSV) files and Excel workbooks (.XLSX), because data was entered manually into the worksheets and mistakes were made.
- Conversely, in databases, the data types are predefined when tables or views are created.
A best practice is to evaluate the column data types in Power Query Editor before you load the data into a Power BI semantic model.
If you determine that a data type is incorrect, you can change it. You might also want to apply a format to the values in a column and change the summarization default for a column.
To continue with the scenario where you’re cleaning and transforming sales data in preparation for reporting, you now need to evaluate the columns to ensure that they have the correct data type. You need to correct any errors that you identify.
You evaluate the OrderDate column. As expected, it contains numeric data, but Power BI Desktop has incorrectly set the column data type to Text. To report on this column, you need to change the data type from Text to Date.
Implications of incorrect data types – Incorrect data types will prevent you from creating certain calculations, deriving hierarchies, or creating proper relationships with other tables.
For example, if you try to calculate the Quantity of Orders YTD, you’ll get the following error stating that the OrderDate column data type isn’t Date, which is required in time-based calculations.
Another issue with having an incorrect data type applied on a date field is the inability to create a date hierarchy, which would allow you to analyze your data on a yearly, monthly, or weekly basis.
The following screenshot shows that the SalesDate field isn’t recognized as type Date and will only be presented as a list of dates in the Table visual.
However, It’s a best practice to use a date table and turn off the auto date/time to get rid of the auto-generated hierarchy. For more information about this process, see Auto-generated data type documentation.
Change the column data type
- You can change the data type of a column in two places:
- in Power Query Editor and in the Power BI Desktop Report view by using the column tools.
- Another method is to select the data type icon next to the column header and then select the correct data type from the list.
Combine multiple tables into a single table – The ability to combine queries is powerful because it allows you to append or merge different tables or queries together. You can combine tables into a single table in the following circumstances:
- Too many tables exist, making it difficult to navigate an overly complicated semantic model.
- Several tables have a similar role.
- A table has only a column or two that can fit into a different table.
- You want to use several columns from different tables in a custom column.
Append queries – When you append queries, you’ll be adding rows of data to another table or query.
For example, you could have two tables, one with 300 rows and another with 100 rows, and when you append queries, you’ll end up with 400 rows.
When you merge queries, you’ll be adding columns from one table (or query) into another.
To merge two tables, you must have a column that is the key between the two tables.
Before you begin combining queries, you can remove extraneous columns that you don’t need for this task from your tables.
To complete this task, format each table to have only four columns with your pertinent information, and rename them so they all have the same column headers: ID, company, name, and phone.
you can combine the queries. On the Home tab on the Power Query Editor ribbon, select the drop-down list for Append Queries.
You can select Append Queries as New, which means that the output of appending will result in a new query or table, or you can select Append Queries, which will add the rows from an existing table into another.
Your next task is to create a new master table, so you need to select Append Queries as New.
This selection will bring you to a window where you can add the tables that you want to append from Available Tables to Tables to Append
Merge queries – When you merge queries, you’re combining the data from multiple tables into one based on a column that is common between the tables.
This process is similar to the JOIN clause in SQL. Consider a scenario where the Sales team now wants you to consolidate orders and their corresponding details (which are currently in two tables) into a single table.
You can accomplish this task by merging the two tables, Orders and OrderDetails, as shown in the following image. The column that is shared between these two tables is OrderID.
You can also choose how to join the two tables together, a process that is also similar to JOIN statements in SQL. These join options include:
Left Outer – Displays all rows from the first table and only the matching rows from the second.
Full Outer – Displays all rows from both tables.
Inner – Displays the matched rows between the two tables.
Shaping data means transforming the data: renaming columns or tables, changing text to numbers, removing rows, setting the first row as headers, and so on.
Combining data means connecting to two or more data sources, shaping them as needed, then consolidating them into a single query.
The Power Query Editor window opens. You can see the default steps applied so far, in the Query Settings pane under APPLIED STEPS.
Source: Connecting to the website.
Extracted Table from Html: Selecting the table.
Promoted Headers: Changing the top row of data into column headers.
Changed Type: Changing the column types, which are imported as text, to their inferred types.
Important Note – Power Query Editor records the steps sequentially, yet independently, of each other. To apply actions in a different sequence, you can move each applied step up or down.
The sequence of applied steps in Power Query Editor is important, and affects how the data is shaped. It’s also important to consider how one step might impact another subsequent step. For example, if you remove a step from the applied steps, subsequent steps might not behave as originally intended.
There are two primary ways of combining queries: merging and appending.
For one or more columns that you’d like to add to another query, you merge the queries.
For one or more rows of data that you’d like to add to an existing query, you append the query.
Power Query Editor creates a new column at the end of the query, which contains the contents of the table (query) that was merged with the existing query. All columns from the merged query are condensed into the column, but you can Expand the table and include whichever columns you want.
List Stores a list of values, For example, { 1, 2, 5, 10 } or { “hello”, “hi”, “good bye” }. A value of List in a cell is the simplest structured column and there is no table relationship.
When you expand the list, you have two options
Result for Expand to New Rows
Result for Extract Values with a semicolon (;) as a delimiter
Record Stores a set of named fields grouped into a unit. For example, [ FirstName = “Joe”, LastName = “Smith”, Birthdate = #date(2010, 1, 2) ]. A value of Record in the cell indicates a one-to-one table relationship with the current or primary table.
Table Stores a table that has a secondary relationship with the current data, which is the primary table. A value of Table in the cell indicates a one-to-many relationship with the current or primary table.
The query must have at least two tables joined in a relationship.
expand operation widens an Order table to include the Order_Details.ProductID, Order_Details.UnitPrice, and Order_Details.Quantity columns to bring together primary table Order rows and related table Order_Details rows
A Record and Table structured column represents a relationship in the data source which has a relational model, such as an entity with a foreign key association in an OData feed or a foreign key relationship in a SQL Server database.
To see a specific value without expanding the column, select a value in the structured column that has a Record value.
Profiling data is about studying the nuances of the data: determining anomalies, examining and developing the underlying data structures, and querying data statistics such as row counts, value distributions, minimum and maximum values, averages, and so on.
This concept is important because it allows you to shape and organize the data so that interacting with the data and identifying the distribution of the data is uncomplicated, therefore helping to make your task of working with the data on the front end to develop report elements near effortless.
Examine data structures – Two ways to examine the data structures
On the Model tab, you can edit specific column and table properties by selecting a table or columns, and you can transform the data by using the Transform Data button, which takes you to Power Query Editor.
Additionally, you can manage, create, edit, and delete relationships between different tables by using Manage Relationships, which is located on the ribbon.
Data anomalies are outliers within your data. Determining what those anomalies are can help you identify what the normal distribution of your data looks like and whether specific data points exist that you need to investigate further. Power Query Editor determines data anomalies by using the Column Distribution feature.
Column quality and Column distribution are shown in the graphs above the columns of data. Column quality shows you the percentages of data that is valid, in error, and empty. In an ideal situation, you want 100 percent of the data to be valid.
Column distribution shows you the distribution of the data within the column and the counts of distinct and unique values, both of which can tell you details about the data counts. Distinct values are all the different values in a column, including duplicates and null values, while unique values do not include duplicates or nulls. Therefore, distinct in this table tells you the total count of how many values are present, while unique tells you how many of those values only appear once.
The column Profile option shows two types of data
Value distribution graph tells you the counts for each distinct value in that specific column. When looking at the graph in the previous image, notice that the value distribution indicates that “Anthony Gross” appears the greatest number of times within the SalesPerson column and that “Lily Code” appears the least number of times. This information is particularly important because it identifies outliers. If a value appears far more than other values in a column, the Value distribution feature allows you to pinpoint a place to begin your investigation into why this is.
Column Statistics will also include how many zeroes and null values exist, along with the average value in the column, the standard deviation of the values in the column, and how many even and odd values are in the column. These statistics give you an idea of the distribution of data within the column, and are important because they summarize the data in the column and serve as a starting point to determine what the outliers are.
You might notice that M code is written top-down.
M code – select the View ribbon of Power Query and then select Advanced Editor.
Later steps in the process can refer to previous steps by the variable name to the left of the equal sign. Be careful about reordering these steps because it could ruin the statement dependencies.
Write to a query formula step by using the in a statement. Generally, the last query step is used as the final data set result.
How to Fill Down Blank or Null Cells in Power Query
Right-click the column header of the column you want to fill.
Select Fill > Down from the right-click menu.

Discover more from LR Virtual Classroom

Subscribe to get the latest posts sent to your email.

Discover more from LR Virtual Classroom

Share this:

Related

Published by Lashmi Bai Ravindrapandian