I recently came across a question in the Business Central forum: How do I get information about Posted Sales Shipments in Power BI? And this is not a standalone question, I get these questions a lot in my training’s that I deliver all over the world.
Many times I see answers suggesting to use existing list pages and publish them as web services, either manually or using the assisted setup: Setup Reporting Data. And although I like the Setup Reporting Data wizard in Business Central, I don’t like the results, or I should say: the performance.
It’s very easy to create new web services using the wizard, but it’s very difficult to use it correctly. Let me start with a simple example. Imagine, as is the above question, you are interested in analyzing the posted sales shipments, and you are interested to display the following fields:
Let’s now use the Setup Reporting Data wizard to create this dataset. First we will create a dataset for the Header:
Then we will create a dataset for the lines:
And let’s filter the lines on only items:
When both datasets are created and published you can find them in the web services page:
Now Power BI Desktop can find these datasets using the Business Central connector:
As you can see, both datasets are here, but the Power BI connector lists all fields from the source pages and does not apply any filters. (Make sure you add the No. and Document No. fields, so you can join the two datasets later.)
Select the Transform Data button to open the Query Editor:
Now you can choose to join the two datasets in the Power Query editor, or import both datasets and libnk them in the relationships window of the Power BI report. Imho it’s better to do this asap, so in the Power Query editor.
Then you can disable load for the two original datasets, since we only need the merged one:
And voila the result:
As you can imagine, every time you refresh the report (manually or scheduled) both queries will be executed, joined and the joined data will be loaded again to the report. Not so very optimal performance wise if you ask me.
The sources are ODATA page objects, SELECT * queries will be sent to the SQL database and an expensive load & join in Power Query will not make the report very fast. When working with demo data you will not notice this, but in real life, with real data, you will have to wait some time for the report to refresh.
Can we do this in a more optimal way?
Yes we can, by replacing the two page datasets with one query dataset. So let’s create a query in AL:
In this query I also fetch the required data from the Sales Shipment Header and Line table, but when it executes only the columns will be fetched from the SQL database, no SELECT *, AND because of the totaling method (Sum), a GROUP BY clause will be added to the SQL statement, aggregating the data with a subtotal of Quantity by Item, Posting Date and Customer. This is a much faster and less expensive way to get the data from the SQL database.
Now publish your query and import it in Power BI:
We have the same result, but much faster. It’s much easier in Power BI, it’s much faster to retrieve and to refresh the data. That’s the power of queries.
Queries are also much easier to maintain, you manage them in your extension, your extension is linked to git and/or DevOps so you can also have some SCM on your query objects, instead of creating them directly in the application using some kind of wizard that generates stuff that’s not really performant…
And, even your Power BI report files can be managed in GIT, DevOps. How cool is that!
Now I can hear you thinking, wait a second. If you copy the URL of the two queries generated with the Setup Reporting Data wizard, then there are filters in the ODATA-URL, so why not use those, wouldn’t that be faster?
No and Yes. Yes, it would be faster then explained here above importing the ODATA-pages, but it’s still a lot slower compared to using a query. And remember, you still require a join in the Power Query editor. Also, a query will beat a page anytime!
Important to mention is that, although query objects are faster then Page objects, especially in reporting scenarios using Power BI or Excel with Power Query, they can still get slow. What I mean is that depending on the amount of data you are importing from for example Ledger Entry tables, if there’s much data to import, even Query objects can take some time. That’s why we have an option to implement a scheduled refresh in the Power BI service. Most customers in the SMB world use a Power BI Pro account, and that means the refresh will always be a full refresh, only a Premium Power BI account has an incremental refresh capability, but it comes at a price: https://powerbi.microsoft.com/en-us/pricing/
Using Power Query, there’s a method, without using a Premium account, to implement incremental refresh, by splitting the load into two separate datasets using a date field, one that you only import once and the second that you refresh. (I will create a blog post about that method soon.) The disadvantage of this approach is that this date, the split-date, always needs to be set manually, so it’s not really dynamic, but it might be an interesting approach for loading big tables (ledgers).
In the end it’s usually a question of: ‘how old’ can your data be in the report? Does it really needs to be refreshed every x-minutes, or every hour, or can it be a little ‘older’? Power BI is a BI solution, and BI solutions rarely report on live data. Some kind of refresh mechanism is usually involved, sometimes even a data-warehouse can help a lot…
You can download the demo code here: