Data Science | AI | DataOps | Engineering
backgroundGrey.png

Blog

Data Science & Data Engineering blogs

Getting Started with Microsoft Fabric Licensing

Microsoft Fabric is a Software-as-a-Service platform (SaaS) which enables you to build an end-to-end analytics solution without the need to spin up complex infrastructure. If you want to know more about Microsoft Fabric then check out our introduction blog post which you can find here.

Since Microsoft Fabric has been announced there have been many questions and queries about licensing and in this blog post I will go through everything you need to know!

What are capacities?

Capacities live inside a tenant and are the compute power behind your processes within Fabric. Much like Power BI capacities, Fabric has its own SKUs which will offer varying levels of performance dependent on what your needs are. This is represented as a ‘Capacity Unit’ (CU) and they measure the amount of compute power within each SKU.

Structuring your tennant

When you begin to build out your data platform in Fabric it is important to consider how you will structure capacities and workspaces, check out Craig’s blog which goes into more detail on how to architect a Fabric platform. From the image below you can see a couple of examples of how this could work. The example on the left shows several capacities separated by each department, each containing its relevant workspaces. The example on the right depicts a more granular example, each tenant is split by department and the capacities are split by team as well as geographical location. I would draw a parallel here with how ‘Administrative Units’ work in Azure.

How do I enable a capacity on my tenant?

You will need to be an owner or contributor of the subscription that you are adding the Fabric capacity to. Capacities are managed in the Azure Portal like any other resource and can be added to a resource group, have tags and belong to a specific region. You can read more about purchasing information here.

Which capacity do I need?

It is important to highlight a couple of points around OneLake before we get into choosing a SKU. The Microsoft documentation is fairly vague at this point but going from some comments it looks like we will have some storage included within our OneLake capacity. What is not clear is how much storage that is. But this will mean that exceeding the amount initially provided will require you to purchase additional storage on top. Pricing for OneLake storage will be comparable to Azure ADLS (Azure Data Lake Storage) pricing.

Source: Announcing Microsoft Fabric Capacities - Microsoft Blog

Now comes the time to make a decision; which SKU should I choose?

There is no way to truly determine what you will need aside from trial and error. Starting from a small SKU and then upgrading as the need arises would be the most sensible approach, you might see some performance issues from doing this until you find the sweet spot, but this would be more cost effective. You can take advantage of a 60-day free trial, and this will give you access to the F64 SKU so this could be a nice way to test the waters on the sort of compute you need for your project.

Let’s look at the available SKUs in the table below:

North Europe Fabric SKUs

I’ve chosen North Europe as an example here, but you can select your own region within the Azure Portal, and you will be able to see prices relevant to your location. Not all regions will have all of these SKUs, and to give you an example, UK South will not give you the option for F512, F1024 or F2048.

SKU Features and Power BI comparison

Generally speaking, each of the SKUs will give you more ‘compute’ as you work your way up from F2 but there are a few to call out. I’ve added another table in below which will show you the equivalent Power BI SKU for reference:

F2 & F4 – Smaller SKUs with a low-cost entry point, did not exist previously within Power BI.

F64 - This is the equivalent to a Power BI P1 SKU. P1 is the Power BI Premium Capacity and if you already have this enabled on the Power BI side you will be able to use all Fabric capabilities. This capacity and higher will also allow your users that have a free license to view content.

F2048 – New Fabric SKU, no equivalent from Power BI SKUs.

You will still need a Power BI Pro license alongside your Fabric capacity if you want to create reports within Power BI.

Has Power BI licensing changed?

A tiny bit but it is pretty straightforward.

Power BI Free is now called Microsoft Fabric Free. Same functionality, just a name change.

Power BI Pro has stayed the same and will only give you access to Power BI features.

Power BI Premium per User is the same and gives you access to Power BI Premium such as dataflows. This is considered ‘partial’ Fabric access, but it only includes Power BI items found within Microsoft Fabric.

Check out the licensing documentation here.

How does Microsoft Fabric handle my workloads?

When we monitor data solutions you tend to see the same patterns in performance. Typically, you will have spikes over the duration of the day as key events take place, this could range from data ingestion to specific users making use of the platform. This can be problematic as you have limited compute power over the whole platform, but you need more in some parts and less in others. There are some interesting features to unpack within Fabric which aim to solve this problem.

Now if we look at the same day with smoothing (right hand image above), we can see that those spikes are still there, but they are far less prominent and are below that capacity line. Smoothing works based on borrowing that compute power from times of low to no use and the total average load is still below the capacity limit, think of it as the average of your total compute within your capacity. This is beneficial as this gives as a lot more freedom with scheduling jobs and not having to spend time working out the most optimal order of things to prevent bottlenecks. But what if we need more performance? How much can we really throw at this? Let’s talk about bursting. 

Bursting is a mechanism that happens automatically when the resources are present to be able to boost performance. It will allocate additional compute resources to allow things such as jobs to complete in a faster time. Additional compute is used which can far exceed the purchased capacity available and you will see huge job acceleration, bursting is combined with smoothing which prevents spikes.

But even after looking at smoothing and bursting it is still possible to overload your capacity. As your performance needs grow there will be a time where the capacity needs to be upgraded as it has been overloaded.

What happens if I overload my capacity?

As you start to get to the point of overload there will be warnings shown to the capacity administrator. But let’s say they are on holiday and the capacity doesn’t get upgraded (or high load projects aren’t scaled down), will everything stop working? No, but there are some things to consider. Any operation that is scheduled will run as normal. Interactive jobs are the first thing to suffer from performance effects during an overload and will be throttled.

How do I fix overloading?

Before rushing to increase a capacity there are a couple of things to consider first.

-          Is the overload a one-off issue or a rare event?

You might have decided that as the overload event is something you see very rarely, and you normally have very good total utilisation of your chosen capacity that upgrading it would be overkill. 

-          Are specific higher load projects causing the overload?

One or multiple projects may have a huge amount of capacity usage caused by potential inefficiencies. Before upgrading it could be a worthwhile exercise to check on those projects to see if you can make some performance improvements to bring the usage lower before upgrading the capacity.

If after looking at those two questions your capacity usage truly does need to be higher, then you can upgrade your capacity to a more fitting size.

Monitoring Usage with Capacity Metrics

Even though the capacity performance features within Fabric are fully automated, it is still important to be proactively monitoring the tenant to identify performance against your chosen capacity. You may find you do have inefficiencies somewhere, such as projects with high load requirements that can be improved to avoid the need to upgrade your capacity. Capacity Metrics can be used to achieve visibility of everything running within the tenant. As we saw in the previous section, overloading a tenant can be prevented with proactive monitoring.

You can see how to enable this here.

Conclusion

In this blog post we have looked at what capacities are, which one to pick and how they compare to Power BI pricing. We will be watching capacities as we move towards GA to see what changes and features we can expect.

Check out our Microsoft Fabric page for more information