November 02, 2025

Stop Wasting Money: The 3 Architectures for Fabric Data Flows Gen 2

23 minutes

Opening Hook & Teaching PromiseSomewhere right now, a data analyst is heroically exporting a hundred‑megabyte CSV from Microsoft Fabric—again. Because apparently, the twenty‑first century still runs on spreadsheets and weekend refresh rituals. Fascinating. The irony is that Fabric already solved this, but most people are too busy rescuing their own data to notice.Here’s the reality nobody says out loud: most Fabric projects burn more compute in refresh cycles than they did in entire Power BI workspaces. Why? Because everyone keeps using Dataflows Gen 2 like it’s still Power BI’s little sidecar. Spoiler alert—it’s not. You’re stitching together a full‑scale data engineering environment while pretending you’re building dashboards.Dataflows Gen 2 aren’t just “new dataflows.” They are pipelines wearing polite Power Query clothing. They can stage raw data, transform it across domains, and serve it straight into Direct Lake models. But if you treat them like glorified imports, you pay for movement twice: once pulling from the source, then again refreshing every dependent dataset. Double the compute, half the sanity.Here’s the deal. Every Fabric dataflow architecture fits one of three valid patterns—each tuned for a purpose, each with distinct cost and scaling behavior. One saves you money. One scales like a proper enterprise backbone. And one belongs in the recycle bin with your winter 2021 CSV exports.Stick around. By the end of this, you’ll know exactly how to design your dataflows so that compute bills drop, refreshes shrink, and governance stops looking like duct‑taped chaos. Let’s dissect why Fabric deployments quietly bleed money and how choosing the right pattern fixes it.Section 1 – The Core Misunderstanding: Why Most Fabric Projects Bleed MoneyThe classic mistake goes like this: someone says, “Oh, Dataflows—that’s the ETL layer, right?” Incorrect. That was Power BI logic. In Fabric, the economic model flipped. Compute—not storage—is the metered resource. Every refresh triggers a full orchestration of compute; every repeated import multiplies that cost.Power BI’s import model trained people badly. Back there, storage was finite, compute was hidden, and refresh was free—unless you hit capacity limits. Fabric, by contrast, charges you per activity. Refreshing a dataflow isn’t just copying data; it spins up distributed compute clusters, loads staging memory, writes delta files, and tears it all down again. Do that across multiple workspaces? Congratulations, you’ve built a self‑inflicted cloud mining operation.Here’s where things compound. Most teams organize Fabric exactly like their Power BI workspace folders—marketing here, finance there, operations somewhere else—each with its own little ingestion pipeline. Then those pipelines all pull the same data from the same ERP system. That’s multiple concurrent refreshes performing identical work, hammering your capacity pool, all for identical bronze data. Duplicate ingestion equals duplicate cost, and no amount of slicer optimization will save you.Fabric’s design assumes a shared lakehouse model: one storage pool feeding many consumers. In that model, data should land once, in a standardized layer, and everyone else references it. But when you replicate ingestion per workspace, you destroy that efficiency. Instead of consolidating lineage, you spawn parallel copies with no relationship to each other. Storage looks fine—the files are cheap—but compute usage skyrockets.Dataflows Gen 2 were refactored specifically to fix this. They support staging directly to delta tables, they understand lineage natively, and they can reference previous outputs without re‑processing them. Think of Gen 2 not as Power Query’s cousin but as Fabric’s front door for structured ingestion. It builds lineage graphs and propagates dependencies so you can chain transformations without re‑loading the same source again and again. But that only helps if you architect them coherently.Once you grasp how compute multiplies, the path forward is obvious: architect dataflows for reuse. One ingestion, many consumers. One transformation, many dependents. Which raises the crucial question—out of the infinite ways you could wire this, why are there exactly three architectures that make sense? Because every Fabric deployment lives on a triangle of cost, governance, and performance. Miss one corner, and you start overpaying.So, before we touch a single connector or delta path, we’re going to define those three blueprints: Staging for shared ingestion, Transform for business logic, and Serve for consumption. Master them, and you stop funding Microsoft’s next datacenter through needless refresh cycles. Ready? Let’s start with the bronze layer—the pattern that saves you money before you even transform a single row.Section 2 – Architecture #1: Staging (Bronze) Dataflows for Shared IngestionHere’s the first pattern—the bronze layer, also called the staging architecture. This is where raw data takes its first civilized form. Think of it like a customs checkpoint between your external systems and the Fabric ecosystem. Every dataset, from CRM exports to finance ledgers, must pass inspection here before entering the city limits of transformation.Why does this matter? Because external data sources are expensive to touch repeatedly. Each time you pull from them, you’re paying with compute, latency, and occasionally your dignity when an API throttles you halfway through a refresh. The bronze Dataflow fixes that by centralizing ingestion. You pull from the source once, land it cleanly into delta storage, and then everyone else references that materialized copy. The key word—references, not re‑imports.Here’s how this looks in practice. You set up a dedicated workspace—call it “Data Ingestion” if you insist on dull names—attached to your standard Fabric capacity. Within that workspace, each Dataflow Gen 2 process connects to an external system: Salesforce, Workday, SQL Server, whatever system of record you have. The Dataflow retrieves the data, applies lightweight normalization—standardizing column names, ensuring types are consistent, removing the occasional null delusions—and writes it into your Lakehouse as Delta files.Now stop there. Don’t transform business logic, don’t calculate metrics, don’t rename “Employee” into “Associates.” That’s silver-layer work. Bronze is about reliable landings. Everything landing here should be traceable back to an external source, historically intact, and refreshable independently. Think “raw but usable,” not “pretty and modeled.”The payoff is huge. Instead of five departments hitting the same CRM API five separate times, they hit the single landed version in Fabric. That’s one refresh job, one compute spin‑up, one delta write. Every downstream process can then link to those files without paying the ingestion tax again. Compute drops dramatically, while lineage becomes visible in one neat graph.Now, why does this architecture thrive specifically in Dataflows Gen 2? Because Gen 2 finally understands persistence. The moment you output to a delta table, Fabric tracks that table as part of the lakehouse storage, meaning notebooks, data pipelines, and semantic models can all read it directly. You’ve effectively created a reusable ingestion service without deploying Data Factory or custom Spark jobs. The Dataflow handles connection management, scheduling, and even incremental refresh if you want to pull only changed records.And yes, incremental refresh belongs here, not in your reports. Every time you configure it at the staging level, you prevent a full reload downstream. The bronze layer remembers what’s been loaded and fetches only deltas. Between runs, the Lakehouse retains history as parquet or delta partitions, so you can roll back or audit any snapshot without re‑ingesting.Let’s puncture a common mistake: pointing every notebook directly to the original data source. It feels “live,” but it’s just reckless. That’s like giving every intern a key to the production database. You overload source systems and lose control of refresh timing. A proper bronze Dataflow acts as the isolating membrane—external data stays outside, your Lakehouse holds the clean copy, and everyone else stays decoupled.From a cost perspective, this is the cheapest layer per unit of data volume. Storage is practically free compared to compute, and Fabric’s delta tables are optimized for compression and versioning. You pay a small fixed compute cost for each ingestion, then reuse that dataset indefinitely. Contrast that with re‑ingesting snippets for every dependent report—death by refresh cycles.Once your staging Dataflows are stable, test lineage. You should see straight lines: source → Dataflow → delta output. If you see loops or multiple ingestion paths for the same entity, congratulations—you’ve built redundancy masquerading as best practice. Flatten it.So, with the bronze pattern, you achieve three outcomes: physicists would call it equilibrium. One, every external source lands once, not five times. Two, you gain immediate reusability through delta storage. Three, governance becomes transparent because you can approve lineage at ingestion instead of auditing chaos later.When this foundation is solid, your data estate stops resembling a spaghetti bowl and starts behaving like an orchestrated relay. Each subsequent layer pulls cleanly from the previous without waking any source system. The bronze tier doesn’t make data valuable—it makes it possible. And once that possibility stabilizes, you’re ready to graduate to the silver layer, where transformation and business logic finally earn their spotlight.Section 3 – Architecture #2: Transform (Silver) Dataflows for Business Logic & QualityNow that your bronze layer is calmly landing data like a responsible adult, it’s time to talk about the silver layer — the Transform architecture. This is where data goes from “merely collected” to “business‑ready.” Think of br

Become a supporter of this podcast: https://www.spreaker.com/podcast/m365-fm-modern-work-security-and-productivity-with-microsoft-365--6704921/support.

If this clashes with how you’ve seen it play out, I’m always curious. I use LinkedIn for the back-and-forth.

...more

View all episodes

By Mirko Peters (Microsoft 365 consultant and trainer)

November 02, 2025

Stop Wasting Money: The 3 Architectures for Fabric Data Flows Gen 2

23 minutes

...more

Share Stop Wasting Money: The 3 Architectures for Fabric Data Flows Gen 2

Sign up to save your podcasts

Stop Wasting Money: The 3 Architectures for Fabric Data Flows Gen 2

Stop Wasting Money: The 3 Architectures for Fabric Data Flows Gen 2