M365.FM - Modern work, security, and productivity with Microsoft 365

Is Your Dataflow Reusable—or a One-Trick Disaster?


Listen Later

Picture this: your lakehouse looks calm, clean Delta tables shining back at you. But without partitioning, schema enforcement, or incremental refresh, it’s not a lakehouse—it’s a swamp. And swamps eat performance, chew through storage, and turn your patience into compost. I’ve seen it happen in more tenants than I care to count. Here’s the fix: stick around, because I’ll give you a 60‑second checklist you can run against any dataflow—parameters, modular queries, Delta targets, and partitioning. Dataflows Gen2 use Power Query/M, so the same rules about modular queries and functions still apply. Subscribe at m365.show, grab the checklist, and let’s see if your “working” dataflow is actually a time bomb. Why Your 'Working' Dataflow is Actually a Time BombThe real issue hiding in plain sight is this: your dataflow can look fine today and still be hanging by a thread. Most people assume that if it refreshes without error, it’s “done.” But that’s like saying your car is road‑worthy because it started once and the check engine light is off. Sure, it ran this morning—but what happens when something upstream changes and the entire pipeline starts throwing fits? That silent culprit is schema drift. Add one column, shift field order, tweak a data type, and your flow can tip over with no warning. For most admins, this is where the blind spot kicks in. The obsession is always: “Did it refresh?” If yes, gold star. They stop there. But survival in the real world isn’t just about refreshing once; it’s about durability when change hits. And change always shows up—especially when you’re dealing with a CRM that keeps sprouting fields, an ERP system that can’t maintain column stability, or CSV files generously delivered by a teammate who thinks “metadata” is just a suggestion. That’s why flex and modularity aren’t buzzwords—they’re guardrails. Without them, your “fixed” pipe bursts as soon as the water pressure shifts. And the fallout is never contained to the person who built the flow. Schema drift moves like a chain reaction in a chemical lab. One new field upstream, and within minutes you’ve got a dashboard graveyard. There’s Finance pushing panic because their forecast failed. Marketing complaints stack up because ad spend won’t tie out. The exec team just wants a slide with charts instead of cryptic error codes. You—the admin—are stuck explaining why a “tiny change” now has 20 dashboards flashing red. That’s not user error; that’s design fragility. Here’s the blunt truth: Dataflows Gen2, and really every ETL process, is built on assumptions—the existence of a column, its data type, order, and consistency. Break those assumptions, and your joins, filters, and calculations collapse. Real‑world schemas don’t sit politely; they zigzag constantly. So unless your dataflow was built to absorb these changes, it’s fragile by default. Think of it like relying on duct tape to hold the plumbing: it works in the moment, but it won’t survive the first surge of pressure. The smart move isn’t hope. It’s defense. If schema drift has already burned you, there’s a quick diagnostic: run the 60‑second checklist. First, does your flow enforce schema contracts or land data in a Delta table where schema evolution is controlled? Second, does it include logic to ingest new columns dynamically instead of instantly breaking? Third, are your joins coded defensively—validating types, handling nulls—rather than assuming perfect input? If you can’t check those boxes, then you’re not done; you’ve just delayed failure. And before you think, “Great, so everything’s doomed,” there’s mitigation available. Fabric supports strategies like dynamic schema handling in Mapping Dataflows and parameterizing queries so they adapt without rewrites. CloudThat and others highlight how dynamic schema detection plus metadata repositories for mappings can reduce the fragility. Those aren’t silver bullets, but they keep your pipelines from detonating every time a developer adds a field on Friday afternoon. One important caveat: even a “healthy” Dataflow Gen2 design has limits. They don’t handle massive datasets as well as Spark, and wide joins or deep transformations can turn refreshes into crawl speed. If you know volumes will hit high scale, offload the heavy work to Spark notebooks and keep Dataflows for lighter prep. Key2Consulting and CloudThat both call this out in practice. Treat Dataflows as one tool in the kit—not the hammer for every job. Bottom line, a so‑called working dataflow that can’t weather schema drift or large‑scale growth isn’t reliable. It’s fragile, adding silent debt into your system. And disposable pipelines, stacked on top of each other, create a tower of quick fixes nobody wants to maintain. That puts us at the next layer of the problem: the bad habits baked in from the very start. Think your setup looks clean? Let’s run it against the three sins that turn “working” pipelines into a nonstop ticket machine.The Three Deadly Sins of Dataflow DesignHere’s where most dataflows go sideways: the three deadly sins of design. They’re simple, they’re common, and they guarantee headaches—hardcoding values, piling on spaghetti logic, and ignoring the fact that scale exists. We’ve all slipped into them because in the moment they look like shortcuts. The problem is when those “shortcuts” snake into production and you’re left with fragile pipelines no one wants to untangle. First up: hardcoding. You’re tired, the refresh is failing, so you paste a file path or static date directly into your query. It works. For now. But what you’ve actually done is cement brittle assumptions into your pipeline. The second someone moves that file, renames a table, or asks for the same logic in a different workspace, the entire thing snaps. A better fix is dead simple—centralize values. Either store them as parameters inside your dataflow or, if you’re managing multiple environments, put your config data in a metadata table. SQL or Cosmos DB both work fine for this. Then your flows don’t care which folder or server they’re pointing at—you just swap the parameter, and everything still refreshes. Sin two: spaghetti logic. It usually starts clean—three steps to connect, transform, and load. Fast forward a few months and you’ve got twenty chained queries full of nested merges, conditional splits, and mystery filters no one admits to writing. At that point, your dataflow feels less like logic and more like a plate of noodles that shocks you if you pick the wrong one. Debugging is guesswork, collaboration is impossible, and governance goes out the window because nobody can even explain where the fields came from. The fix? Break the work into named, single-purpose queries. Use functions in Power Query M for reusable bits like date handling or path parsing. Yes, Dataflows Gen2 supports this, but remember: reusing those blocks across workspaces has limits. If you need true reuse across your tenant, build the canonical version in a single “source of truth” dataflow or push complex transformations down into your lakehouse or notebook layer. Bottom line—write logic in chunks that humans can read tomorrow, not in one monster chain nobody can ever touch again. Last sin: ignoring scale. On demo-sized test data, everything looks magical. Four thousand rows? Instant refresh. Then the real dataset drops—four million rows with concurrent refreshes—and suddenly your dataflow is standing in quicksand. It backs up the refresh queue, hogs compute, and everything else grinds to a halt. This isn’t Fabric being weak; it’s your design never accounting for production volume. Small joins turn into bottlenecks, wide transformations chew through memory, and incremental refresh gets ignored until jobs start timing out. If you actually want things to run, test with production-like volumes early. Use coalesce to cut down partition counts and repartition strategically so the engine isn’t juggling thousands of tiny chunks. And give yourself a hard rule: if your refresh times balloon past usable, it’s time to either push the heavy transformations into a Spark notebook or tune the partitioning until they behave. Test at scale, or production will test you instead. Here’s the kicker—these sins don’t live in isolation. Hardcode enough values and you’ll be rewriting every time the environment shifts. Let spaghetti logic grow and you’re one step away from a full black-box nobody understands. Ignore scale, and eventually workloads pile up until the whole refresh ecosystem collapses. Each one of these mistakes adds debt; mix them and you’re trading resilience for fragility at compound interest. Fixing them isn’t about perfection—it’s about giving yourself guardrails. Safe defaults like parameters, modular queries, and realistic testing keep your pipelines stable enough to survive the normal chaos of changing schemas and growing datasets. The trick now is turning those guardrails into your standard operating mode, so you build flows that adapt instead of collapse. And that’s the bridge to the real differentiator—the design habits that actually make dataflows reusable instead of disposable.The Secret Sauce: Modularity and ParameterizationSo how do you actually keep a dataflow from turning into a throwaway experiment? The answer comes down to two things: modularity and parameterization. Skip those and you’re not building pipelines—you’re cobbling together one-offs that collapse the minute requirements shift. A reusable dataflow is one that can drop into another project, adapt with minimal tweaks, and still function. Anything else is just glorified copy-paste maintenance. Modularity starts with carving transformations into standalone steps. Picture a block of logic that standardizes customer names. If you bury it inside a bloated 50-step chain, it’s stuck there forever. Pull that same logic into a separate function, and suddenly it’s a reusable tool across any data source. That’s the shift: building small, utility

Become a supporter of this podcast: https://www.spreaker.com/podcast/m365-fm-modern-work-security-and-productivity-with-microsoft-365--6704921/support.

If this clashes with how you’ve seen it play out, I’m always curious. I use LinkedIn for the back-and-forth.
...more
View all episodesView all episodes
Download on the App Store

M365.FM - Modern work, security, and productivity with Microsoft 365By Mirko Peters (Microsoft 365 consultant and trainer)