The Data Massagist The Data Massagist by Pablo Junco

The Economics of Modern Data Platforms (Microsoft Fabric vs. Azure Databricks)

March 18, 2026 · 21 min read
Databricks MS Fabric Newsletter
This content is mirrored from LinkedIn and may contain formatting inconsistencies. For the full experience — including comments and reactions — read the original on LinkedIn.

The Data Massagist
From messy data to measurable outcomes—governed platforms that power agentic AI.

The Economics of Modern Data Platforms (Microsoft Fabric vs. Azure Databricks)

Created on 2026-03-12 11:01

Published on 2026-03-18 13:01

Hello there, and welcome to Edition #7 of The Data Massagist Newsletter.

If you’re not subscribed yet, now is the perfect time—join a growing community of 2,800+ professionals exploring the future of data, AI, and modern data platforms. Click here to subscribe on LinkedIn.

In the last few editions, I’ve focused on some of the most important architectural decisions organizations face when building a modern data platform.

In Edition #6, I shared how to design a modern data platform using two of the most powerful platforms in the market today (Microsoft Fabric and Azure Databricks) and, more importantly, how to interconnect them to get the best of both worlds.

In Edition #5, I explored a challenge many organizations are currently facing: how to migrate legacy environmentssuch as traditional data warehouses or Hadoop-based platforms—into modern data ecosystems.

And in Edition #3, I introduced a concept that resonated strongly with many of you (thank you for the feedback!): The Seven Layers of a Real Data Platformnot just as a technical stack, but as a set of business stages that connect technology with real organizational value.

If data is the fuel of AI, then the data platform is the engine.

But there is another reality every data leader, software architect, and data engineer must understand: how that engine is priced.

Today, I want to focus on something that often surprises teams once they start scaling: how you actually get charged when using Microsoft Fabric and Azure Databricks.

Understanding pricing is not just a financial exercise—it’s an architectural decision. The way workloads are designed, scheduled, and executed has a direct and often dramatic impact on cost efficiency.

In this edition, we’ll break down how Microsoft Fabric and Azure Databricks really charge you—starting with Capacity Units (CUs) and Databricks Units (DBUs), what they mean, and why they matter.

Then we’ll go beyond pricing tables to explore what truly drives cost in real-world platforms—compute, storage, data movement, and query behavior—and how these differ between both ecosystems.

Finally, we’ll shift from mechanics to architecture: how to organize environments and teams for sustainable scale, followed by practical strategies to optimize cost without slowing down innovation.

So, let’s get started.

1.- Compute Consumption-based Platforms

1.1.- Pricing fundamentals

Compute represents the active processing power used to run queries, pipelines, notebooks, and AI workloads—and it is the primary driver of cost in any modern analytics platform.

Because compute is elastic, shared, and workload-dependent, understanding how it is provisioned, consumed, and governed is critical for cost control. Poorly sized or always-on compute leads to wasted spend, while well-managed compute enables teams to scale on demand without losing financial discipline.

A simple but critical recommendation: never move a Proof of Concept (PoC) or pilot environment directly into production. These environments were designed for learning and speed—not efficiency, governance, or scale.

In practice, compute management is what allows organizations to balance performance, concurrency, and predictability as usage grows—turning analytics from a fixed infrastructure cost into a controllable, business-aligned operating expense.

Both Microsoft Fabric and Azure Databricks are consumption-based platforms—but they express consumption very differently:

  • Fabric asks: How big is your engine, and how long is it running?

  • Databricks asks: What type of engine are you running, for how long, and how often?

1.1.a.- Microsoft Fabric Capacity Units (CUs)

Microsoft Fabric separates compute and storage costs. Compute is priced using Capacity Units (CUs), where you purchase a fixed capacity and all workloads draw from that shared pool—across Analytics, Real-Time Intelligence, Semantic Models, Power BI, and more.

Compute in Fabric is Priced as Capacity Units

Fabric runs on pre-provisioned capacities (F-SKUs) that are billed continuously. These capacities do not natively auto-scale based on demand.

Capacities range from small (F2) to very large (F2048), with pricing varying by Azure region (for example: Brazil South, East US 2, West Europe, or Spain Central). Each capacity can serve one or multiple workspaces.

Microsoft Fabric related components

Capacities range from small (F2) to very large (F2048), with pricing varying by Azure region (for example: Brazil South, East US 2, West Europe, or Spain Central). Each capacity can serve one or multiple workspaces.

Microsoft offers two pricing models:

  • Pay-As-You-Go (PAYG): Maximum flexibility with no long-term commitment

  • Reserved Capacity: 12-month commitment for predictable workloads at a lower cost

Once the cost behavior of the solution is understood, your organization can take advantage of commitment‑based discounts—such as RIs, Savings Plans, MACCs, and committed use discounts—to optimize spend; for example, with Microsoft Fabric, committing to (at least) 1‑year reservation can deliver savings of approximately 41% compared to PAYG pricing.

Microsoft Fabric also supports autoscale billing for Spark workloads, allowing jobs to temporarily consume additional PAYG capacity beyond the reserved baseline. This is currently the only true autoscaling mechanism in Fabric.

1.1.b.- Azure Databricks Units (DBUs)

Azure Databricks charges compute through Databricks Units (DBUs), representing compute consumption per second based on workload type, cluster configuration, and runtime.

Different workloads consume DBUs at different rates, including:

  • All-purpose (interactive) compute

  • Jobs compute / Jobs Light

  • SQL compute / SQL Pro

  • Serverless workloads

Azure Databricks related components

DBUs are billed per second while compute is running and cover the Databricks-managed layer, including:

  • Cluster orchestration

  • Spark runtime

  • Job scheduling

  • Notebooks

  • Governance capabilities

Azure Databricks supports:

  • Pay-As-You-Go (per-second billing)

  • Committed-use discounts (e.g., pre-purchased DBUs or savings plans)

1.2.- What do they have in common?

Despite their different mechanics, both platforms share several key principles:

  • Consumption-based economics: You pay for usage, not licenses

  • Compute is the main cost driver: Storage is cheap; execution is not

  • Architecture drives cost: Poor design burns money quickly

  • Elasticity matters: Scaling correctly is more important than size

In reality, compute is where most of the money is spentand yes, where Microsoft and Databricks primarily generate revenue.

In Fabric, long-running pipelines, Spark workloads, and semantic model refreshes continuously consume capacity.

In Databricks, cost is driven by cluster size, runtime duration, and interactive usage patterns.

One of the most common hidden costs in both platforms is always-on compute. Ultimately, cost is not an afterthought—it is the direct result of architectural decisions.

1.3.- What really differentiate them?

The key difference between Microsoft Fabric and Azure Databricks lies in how they handle predictability, isolation, and peak demand.

1.3.a.- Microsoft Fabric

Fabric is built around a shared, capacity-based model that prioritizes efficiency and elasticity over strict performance guarantees.

Workloads share a fixed pool of CUs, and Fabric uses two mechanisms to handle spikes:

  • Bursting: Allows workloads to temporarily use more compute than the purchased capacity so queries and jobs finish faster. The extra compute is borrowed short‑term, not free.

  • Smoothing: Spreads that extra compute usage over time, averaging consumption across minutes or hours so short spikes don’t immediately cause throttling—but sustained overuse still counts.

Interactive workloads are smoothed over minutes, while background workloads are smoothed over a 24-hour window.

When sustained demand exceeds capacity, Fabric applies queuing, throttling, or request rejection to maintain stability.

Monitoring throttling in Microsoft Fabric

This is by design: Fabric guarantees service availability, not workload performance.

Isolation is logical rather than physical and is achieved by assigning workloads to different capacities.

1.3.b.- Azure Databricks

Databricks follows a compute-first model, where predictability and isolation are explicit architectural choices.

Workloads run on dedicated clusters or SQL warehouses, consuming only the compute assigned to them.

During peak demand, Databricks scales by:

  • Starting new clusters

  • Autoscaling existing ones

If capacity is insufficient, workloads fail fast instead of being implicitly queued.

This results in highly predictable performance and strong isolation boundaries.

Databricks guarantees service availability (99.95% SLA), while performance depends on how compute is provisioned and managed.

2.- How is storage billed?

Storage is rarely the primary cost driver. Both platforms rely on low-cost object storage—but inefficient access patterns can significantly increase total cost.

2.1.- Microsoft Fabric

In Microsoft Fabric, data storage is billed independently through OneLake, a single logical data lake per tenant. When data stays in OneLake and is reused across workloads (Lakehouses, Warehouses, Eventstreams, KQL databases, PowerBI, etc.).

OneLake, beyond a storage for Microsoft Fabric

OneLake storage pricing always follows the PAYG model per GB per month, with these characteristics:

  • Charged per GB stored, prorated during the month

  • Soft-deleted data continues to incur charges until it is permanently deleted

  • Costs appear in the Azure subscription (not at the Fabric tenant level)

  • Storage usage can be monitored in the Fabric Capacity Metrics app

For mirrored data, a portion of storage is included for free based on the Fabric capacity SKU (for example, an F64 includes 64 TB of mirrored storage). However, if the capacity is paused, mirrored storage becomes fully billable.

Some internal storage used by native Fabric itemssuch as Power BI import semantic models or Fabric SQL Database internalsis included in the service cost and not billed separately in OneLake.

Finally, storing data itself does not consume CUs, but accessing it does. Any reads, writes, or metadata operations in OneLake consume CUs, and the consumption depends on the operation type, data volume, and number of requests.

2.2.- Azure Databricks

Azure Databricks does not include native storage. Instead, it relies on Azure Data Lake Storage Gen2 (ADLS Gen2) or Azure Blob Storage for persistent data. By design, Databricks also separates compute from storage: data resides in customer-owned Azure storage accounts, while Databricks provides the compute, orchestration, and analytics layer on top.

Most common enterprise patten for storage accounts with Azure Databricks

All data used by Azure Databricks workloadstables, files, checkpoints, logs, and outputs—is billed according to standard Azure Storage pricing in the Azure subscription where the storage account lives. This includes per-GB storage costs, transaction charges (reads, writes, lists), and optional features such as redundancy, lifecycle policies, and data movement. Databricks does not bundle or markup storage costs; they appear as regular Azure Storage charges, completely independent from Databricks Units (DBUs).

DBUs cover only the compute and platform layer, including cluster orchestration, Spark runtime, SQL warehouses, job scheduling, notebooks, and governance features. Persistent storage is always external, giving customers full control over storage tiers, retention policies, security, networking, and encryption.

Importantly, storage access itself does not directly consume DBUs. DBU consumption depends on how long clusters run, their size, and the type of workload, not on the number of storage I/O operations. Because storage is external and always available while compute is ephemeral, data remains accessible even when clusters are stopped, and costs can be controlled by auto-terminating clusters, using job clusters, and applying storage lifecycle policies.

Since yesterday’s announcement by Anavi Nahar (Head of Product, Azure Databricks) and Databricks, Serverless Workspaces in Azure Databricks are now generally available. This GA milestone marks a shift toward a fully managed, serverless workspace model that provides instant compute and Databricks‑managed default storage out of the box. While this greatly simplifies adoption and governance through Unity Catalog, it also introduces a new storage cost model that teams must understand from day one. Storage is billed separately using Databricks Storage Units (DSUs), and customers pay for what is stored and retained over timeincluding managed tables, snapshots, and time‑travel history. As a result, default storage reduces operational complexity but still requires the same level of cost awareness and lifecycle management as any other enterprise storage layer.

Finally, for organizations using Microsoft Fabric, Azure Databricks can read (and soon write) data stored in OneLake without duplicating it, enabling cross-platform data access while keeping storage managed in Azure.

3.- The implications of Data Movement

Moving data is often the hidden cost driver in modern platforms. The biggest cost multipliers include:

  • Data duplication across systems

  • Cross-region access

  • Repeated transformations

  • Redundant ingestion pipelines

The architectural principle that consistently reduces cost is simple:

Architectures that minimize data movement and maximize data locality are always cheaper.

3.1.- Microsoft Fabric

In Microsoft Fabric, data movement costs tend to appear indirectly through capacity consumption, rather than as explicit network charges. As mentioned, Microsoft Fabric is built around OneLake, where all workloads share the same underlying data layer, which minimizes physical data movement. However, logical data movement still consumes compute, and that is where costs typically emerge.

I recommend keeping the following considerations in mind:

  • Reading and writing data in OneLake consumes CUs. Even though storage is billed separately, every read, write, or transformation operation consumes CUs. Re-processing the same datasets multiple times directly impacts capacity utilization and can quickly increase costs.

  • Copy-based integration patterns can be expensive. Using pipelines to repeatedly copy data between lakehouses, warehouses, or semantic layers increases CU consumption and may lead to throttling under sustained workloads.

  • Cross-region scenarios amplify CU consumption. If Fabric capacity is deployed in one region while data sources or consumers are located in another, higher latency can lead to longer-running operations, indirectly increasing CU usage.

  • OneLake shortcuts reduce physical data movement. Shortcuts allow multiple Fabric items to reference the same physical dataset without copying it. This significantly reduces both storage duplication and compute overhead while improving architectural efficiency.

3.2.- Azure Databricks

In Azure Databricks, data movement costs are more explicit and easier to observe, but also easier to control through good architecture.

Databricks separates compute, storage, and networking, so movement costs typically appear in the following ways:

  • Additional compute runtime (DBUs): Every time data is read from or written to Azure Data Lake Storage (ADLS) Gen2, clusters must be running. Longer pipelines, cross-region access, and repeated transformations translate directly into more DBUs consumed.

  • Azure network egress charges Cross-region reads or writes to ADLS Gen2 can incur direct Microsoft Azure data transfer costs, on top of Databricks compute charges.

  • Redundant pipelines across workspaces: Ingesting the same datasets into multiple Azure Databricks workspaces (or multiple lakes) multiplies both storage and compute costs.

  • Serverless data transfer charges (when applicable): For Databricks Serverless workloads, data transfer and connectivity can introduce additional per-GB charges depending on the access patterns.

Databricks mitigates these costs through several architectural practices:

  • Shared storage across workspaces using Azure Data Lake Storage Gen2

  • Delta Lake as the single source of truth

  • Compute-to-data patterns, spinning up compute close to where the data already resides to minimize unnecessary movement.

4.- Query Behavior and Cost Management in Modern Analytics Platforms

Query behavior describes how users, applications, and automated processes interact with data—how frequently queries run, how much data they scan, how complex they are, and how many execute concurrently.

In consumption-based analytics platforms, query behavior is one of the strongest determinants of cost. Full table scans, poorly filtered joins, repeated ad-hoc exploration, and uncontrolled concurrency all translate directly into additional compute consumption.

Individually, these inefficiencies may appear minor. At scale—especially as data volumes and self-service analytics grow—they become a major driver of unpredictable spending.

For this reason, query behavior should be treated not only as a performance concern but also as a core financial control in modern data platforms.

4.1.- Shared Best Practices for Optimizing Query Behavior

  • Design data models for access patterns, not just correctness: Queries are only as efficient as the models they run against. Well-designed schemas, appropriate grain, and reduced cardinality allow engines to scan less data and complete faster.

  • Avoid repeated computation: Many analytics workloads repeatedly answer the same questions. Without caching, materialized views, or pre-aggregated layers, platforms must recompute results every time. Identifying high-frequency queries and pre-computing their outputs is one of the most effective cost optimizations.

  • Manage concurrency deliberately: Concurrency is a hidden cost amplifier. Even efficient queries become expensive when many run simultaneously. Separating exploratory workloads from production dashboards and staggering refresh schedules helps prevent unnecessary compute spikes.

  • Align compute with query intent: Different workloads require different performance characteristics. Exploratory analysis, scheduled reporting, and operational dashboards should run on execution environments aligned with their latency and reliability needs.

  • Make query cost visible: Optimization improves dramatically when teams can see which queries are expensive and why. Query-level observability—execution time, data scanned, and compute consumed—enables targeted improvements instead of reactive scaling.

4.2.- Query Behavior Considerations in Microsoft Fabric

In Microsoft Fabric, query behavior directly affects Capacity Unit (CU) consumption across the shared capacity.

Fabric’s bursting and smoothing mechanisms can temporarily absorb spikes in workload demand. However, excess consumption is later smoothed over time, meaning inefficient queries may not fail immediately but can gradually erode available capacity.

This makes query discipline particularly important.

Key considerations include:

  • Semantic model quality — Inefficient semantic models often generate complex or repetitive SQL queries behind the scenes.

  • Concurrency management — Multiple moderate queries running simultaneously can consume more capacity than a few well-optimized ones.

  • Shared-capacity awareness — Because multiple workloads share the same capacity, inefficient queries in one workload can impact performance across the entire environment.

In Fabric, optimizing query behavior is essential to maintaining predictable capacity usage and avoiding chronic throttling or unnecessary capacity upgrades.

4.3.- Query Behavior Considerations in Azure Databricks

In Azure Databricks, query behavior primarily affects cluster runtime and DBU consumption.

Databricks’ elastic compute model allows clusters to scale quickly to handle demand. While this flexibility is powerful, inefficient queries can extend cluster lifetimes, increase DBU consumption, and lead teams to over-provision resources.

Important considerations include:

  • Selecting the correct compute type: Interactive clusters, job clusters, SQL warehouses, and serverless environments each have different cost profiles.

  • Constraining exploratory queries: Unbounded queries or large scans during ad-hoc analysis can dominate compute usage.

  • Leveraging observability: Databricks system tables and workload metrics make it possible to identify and optimize the most expensive queries with precision.

When managed properly, Databricks’ elasticity becomes an advantage: compute scales up when needed and shuts down as soon as the workload completes.

5.- Tools to Monitor, Manage, and Predict Cost

Congratulations—you now have a clearer understanding of the pricing models behind Microsoft Fabric and Azure Databricks, as well as the cost implications associated with how each platform’s components are used.

But understanding pricing is only the beginning. The next step is operational discipline: continuously monitoring performance, identifying bottlenecks, and applying optimizations as needed. Once your solution is deployed to production—or even during performance testing in a staging environment—you must establish strong observability from day one.

If you can’t see where cost comes from, you can’t control it.

Across modern data platforms, the most significant cost failures rarely stem from pricing models themselves. They come from blind spots. When teams lack visibility into where costs are generated, who is driving them, and why they fluctuate over time, optimization becomes reactive, political, and ultimately inefficient.

Effective cost management, therefore, is not just about reporting. It’s about enabling early detection, clear attribution, and predictable consumption patterns—so teams can act proactively rather than retroactively.

5.1.- Microsoft Fabric

Microsoft Fabric provides capacity-level observability, aligned with its shared-compute architecture. Instead of tracking individual clusters or engines, Fabric focuses on how workloads consume Capacity Units (CUs) over time.

Fabric also offers workload-level monitoring. Teams can analyze pipelines, notebooks, and semantic models to identify inefficiencies such as long-running Spark jobs, inefficient refresh patterns, overly frequent semantic model refreshes, or BI workloads competing with ETL processes on the same capacity.

Because Microsoft Fabric uses a shared-capacity model, its usage insights naturally guide architectural decisions. Teams often optimize costs by separating ETL and BI workloads into different capacities, scheduling heavy processing during off-peak hours, or reducing redundant transformations within the same environment.

One of Fabric’s strengths is that it clearly shows when capacity is under pressure and which workload type is responsible. However, it also requires a different mindset: instead of optimizing individual job costs, teams must focus on capacity planning and workload isolation to maintain performance and cost efficiency.

5.1.1.- Microsoft Fabric Capacity App

A central tool is the Capacity Metrics App, which helps teams understand cost drivers across the platform. It provides visibility into CU utilization over time, bursting and smoothing behavior, throttling events, and which workloads are consuming capacity (i..eApache Spark, SQL, dataflows, or Microsoft Power BI—are consuming capacity. This helps distinguish between temporary spikes and sustained capacity pressure. By default, the Microsoft Fabric Capacity Metrics App displays workload consumption trends over the past 14 days.

Fabric's Capacity Metrics App

5.1.2.- Fabric Cost Analysis Solution Accelerator

I want to recommend a free solution accelerator available in Github for Fabric platform to monitor cost: The Fabric Cost Analysis (or FCA). FCA is another tool that can help us understand and monitor of the Microsoft Fabric Cost.

FCA provides a unified view of Microsoft Fabric by combining financial and operational insights. It aggregates data from sources like Azure Cost Management and enriched datasets, enabling both high-level analysis and deep dives into usage, quotas, and reservations. You can watch a demo here.

Built entirely on Fabric by Microsoft employees ( Romain Casteres, cédric dupui and Manel Loubna OMANI ), FCA uses Pipelines and Notebooks for data processing, stores data in raw and Delta formats, and enables direct access through Power BI. It offers standard reports while allowing users to customize and extend analysis, including integration with external data sources. That said, FCA solution accelerator isn't an official Microsoft solution.

5.1.3.- Microsoft Fabric Go‑Live Assessment

Customer with an active Unified Support contract can request the Microsoft Fabric Go‑Live Assessment which is a Microsoft Services led engagement (VBD – Value Based Delivery), where an accredited Cloud Solution Architect (CSA) will work with your organization for a personalized plan to assess the overall health of your Microsoft Fabric tenant, including capacity usage, governance, security, compliance, and operational readiness, identifying risks, best‑practice gaps, and cost optimization opportunities, and delivering clear, actionable remediation recommendations to ensure your Fabric environment is ready to operate and scale confidently in production.

5.2.- Azure Databricks

Azure Databricks takes a different approach by emphasizing explicit compute ownership and granular cost attribution. Instead of abstracting infrastructure consumption, it exposes cost drivers in a way that aligns closely with how engineers think about workloads and execution.

One key capability is cluster-level visibility. Each cluster provides metrics such as runtime duration, node count, autoscaling behavior, and idle versus active time. This makes it easier to detect inefficiencies like oversized clusters or resources left running after jobs finish.

Databricks also provides job- and workload-level attribution. Costs can be linked to specific jobs, pipelines, users, or service principals, enabling effective chargeback or showback models across teams.

Another strength is cost attribution through tagging and system tables. Workspaces, clusters, and jobs can be tagged by team, project, or environment, allowing organizations to align spending with ownership. This integrates naturally with FinOps practices.

The main advantage of Azure Databricks is that it makes it straightforward to answer: “Which workload generated this cost?”

However, visibility alone does not eliminate waste. Without governance practices such as auto-termination, job clusters, and consistent tagging, inefficient resource usage can still occur.

5.3.- Cross-Platform leveraging Microsoft Azure capabilities

Neither Microsoft Fabric nor Azure Databricks should be monitored in isolation. Azure Cost Management provides the unifying layer that enables consistent cost visibility and governance across both platforms.

Key cross‑platform practices include:

  • Cost aggregation across services — Azure Cost Management enables teams to view and analyze spend across Fabric capacities, Databricks workspaces, and underlying Azure resources in a single place.

  • Budgets, alerts, and anomaly detection — Defining budgets and automated alerts helps teams proactively detect cost spikes, unexpected usage patterns, and budget overruns before they become issues.

  • FinOps operating model — The most successful organizations embed FinOps practices that align engineering, platform teams, and finance around shared cost accountability and continuous optimization.

FinOps transforms cost management from a passive reporting activity into an active feedback loop, enabling faster decision‑making, better architectural choices, and sustained cost efficiency across platforms.

The Architect’s Responsibility in the AI Era

In modern analytics platforms, cost is not negotiated—it is designed.

Every architectural decision influences how efficiently compute is consumed: how data is stored, moved, queried, and scaled. As adoption grows, these decisions compound.

I'm sure Iasa Global will agree with me here.

Good architectures scale performance. Great architectures scale performance and cost efficiency together.

And in the AI era, that difference matters more than ever.

View on LinkedIn ← Back to Articles

Let’s talk!
Let's have cafecito together.

If you’re a Chief Data Officer (CDO), a data leader, or simply someone who believes in the power of preparing data for AI—you’re already a Data Massagist.

Whether you have an idea, a challenge, or just want a fresh perspective, let’s connect. I’m always open to collaborating, learning, and helping others move forward.

You can find me on LinkedIn (feel free to connect and send me a message), or book time with me directly for a virtual coffee (or "cafecito").