August 08, 2025

Building Machine Learning Models in Microsoft Fabric

21 minutes

Ever wondered why your machine learning models just aren’t moving fast enough from laptop to production? You’re staring at data in your Lakehouse, notebooks are everywhere, and you’re still coordinating models on spreadsheets. What if there’s a way to finally connect all those dots—inside Microsoft Fabric? Stay with me. Today, we’ll walk through how Fabric’s data science workspace hands you a systems-level ML toolkit: real-time Lakehouse access, streamlined Python notebooks, and full-blown model tracking—all in one pane. Ready to see where the friction drops out?From Data Swamp to Lakehouse: Taming Input ChaosIf you’ve ever tried to start a machine learning project and felt like you were just chasing down files, you’re definitely not alone. This is the part nobody romanticizes—hunting through cloud shares, email chains, and legacy SQL databases, just to scrape enough rows together for a pass at training. The reality is, before you ever get to modeling, most of your time goes to what can only be described as custodial work. And yet, it’s the single biggest reason most teams never even make it out of the gate.Let’s not sugarcoat it: data is almost never where you want it. You end up stuck in the most tedious scavenger hunt you never signed up for, just to load up raw features. Exporting CSVs from one tool, connecting to different APIs in another, and then piecing everything together in Power BI—or, if you’re lucky, getting half the spreadsheet over email after some colleague remembered to hit “send.” By the time you think you’re ready for modeling, you’ve got the version nobody trusts and half a dozen lingering questions about what, exactly, that column “updated_date” really means.It’s supposed to be smoother with modern cloud platforms, right? But even after your data’s “in the cloud,” it somehow ends up scattered. Files sit in Data Lake Gen2, queries in Synapse, reports in Excel Online, and you’re toggling permissions on each, trying to keep track of where the truth lives. Every step creates a risk that something leaks out, gets accidentally overwritten, or just goes stale. Anyone who’s lost a few days to tracking down which environment is the real one knows—there’s a point where the tools themselves get in the way just as much as the bureaucracy.That’s not even the worst of it. The real showstopper is when it’s time to build actual features, and you realize you don’t have access to the columns you need. Maybe the support requests data is owned by a different team, or finance isn’t comfortable sharing transaction details except as redacted monthly summaries. So now, you’re juggling permissions and audit logs—one more layer of friction before you can even test an idea. It’s a problem that compounds fast. Each workaround, each exported copy, becomes a liability. That’s usually when someone jokes about “building a house with bricks delivered to five different cities,” and at this point, it barely sounds like a joke.Microsoft Fabric’s Lakehouse shakes that expectation up. Ignore the buzzword bingo for a minute—Lakehouse is less about catching up with trends and more about infrastructure actually working for you. Instead of twelve different data puddles, you’ve got one spot. Raw data lives alongside cleaned, curated tables, with structure and governance built in as part of the setup. For once, you don’t need a data engineer just to find your starting point. Even business analysts—not just your dev team with the right keys—are able to preview, analyze, and combine live data, all through the same central workspace.Picture this: a business analyst wants to compare live sales, recent support interactions, and inventory. They go into the Lakehouse workspace in Fabric, pull the current transactions over, blend in recent tickets, all while skipping the usual back-and-forth with IT. There are no frantic requests to unblock a folder or approve that last-minute API call. The analyst gets the view they need, on demand, and nothing has to be passed around through shadow copies or side channels.The security story is bigger than it looks, too. Instead of gluing together role-based access from five different systems—or worse, trusting everyone just to delete their old copies—permissions sit right at the data workspace level. If you need only sales, you get just sales. If someone on a different team needs to reference inventory, they see what’s necessary, and nothing else. There’s no need for late-night audits or accidental oversharing sent out in another email blast. This kind of granular control has teeth, and it means the system is finally working for you—not just IT compliance officers.Most ML tools promise “easy access” but Fabric’s Lakehouse sets it up for both sides: technical users dive straight into raw data, analysts use the same space with understandable, curated views. It eliminates most of those arguments about “missing context,” and it’s the first time both sides can operate in parallel without running into each other. Suddenly, feeding a model isn’t a six-step puzzle—it’s just picking from well-organized inputs.Now, don’t get me wrong—centralizing your data feels like arriving at the party, but you’re still far from launching an ML model. Taming your input chaos only lines up the dominoes. You know exactly who has access, you’ve finally got a data system everyone can agree on, and everything’s ready to pipe into your next experiment. But here’s the sticking point: even with this head start, most teams hit friction when it’s time to move from wrangled data to actual feature engineering and model-building. Notebooks are supposed to be the on-ramp, but more often, they become their own maze of version conflicts, broken environments, and lost progress.So with your inputs sorted—both secure and actually usable—you’d expect things to speed up. But what’s supposed to be the easy part, collaborating in notebooks, still brings plenty of pain. Why do so many projects stall when all you’re trying to do is turn clean data into something the model can use?Python Notebooks Without the Pain: Streamlining the ML ProcessIf you’ve spent any time actually trying to move a project forward in Python notebooks, you already know how this goes. It starts with a good intention: let’s reuse what already works, just clone your teammate’s notebook and hit run. But then you land in dependency purgatory—pandas throws a version error, matplotlib won’t plot the way you expect, and half the code relies on a package nobody mentioned in the docs. Even inside cloud platforms that promise smooth collaboration, you’re jumping between kernels, patching environments, and quietly dreading that awkward chat asking who set up the original development space. Instead of a sprint, it feels like you’re wading through molasses. The joke in most data science teams is that there are always more notebook versions than people in the workspace—and nobody’s sure which notebook even worked last.We’ve all seen that folder called “final_notebook_v5_actual.ipynb” and its six unofficial cousins. When everyone works in their own little bubble, changes pile up fast. Maybe your colleague added a new feature engineering trick, but saved it locally. Someone else tweaked the pipeline for the customer churn dataset but didn’t sync it back to the team folder. And you, working late, discover that the only working notebook relies on libraries last updated in 2021. Just setting things up burns through the first chunk of your project budget. Version control gets messy, dependencies drift, and suddenly the project’s output becomes as unpredictable as the tools themselves.Now, let’s be honest—this isn’t just a confidence issue for junior data scientists. Even seasoned teams trip over this problem. Maybe you run notebooks through Databricks, or you’re using JupyterHub spun up on a managed VM, but environments mutate and tracking which projects have which libraries is a problem nobody admits to enjoying. Meetings start with five minutes of “wait, are you using the new version or the one I sent on Teams?” It’s a shared kitchen where every chef brings their own knives, and you’re left with a half-baked stew because half the utensils are missing or nobody remembered whose batch was actually edible.This is one of the places where Microsoft Fabric flips the usual story. Each new notebook session comes pre-configured with the baseline set most people need: scikit-learn, pandas, PyTorch, and the rest. You don’t have to run endless install scripts, fudge dependency versions, or file tickets just to get the essentials. The environment sits on a foundation that’s stable, predictable, and updated by the platform. It means more time fiddling with your model and less time scanning Stack Overflow for fixes to some cryptic pip exception.But availability isn’t just about libraries. It’s about getting right to the data, right now, not after a twelve-step API dance. Fabric’s notebooks tie directly into the Lakehouse—no jumping through hoops, no awkward connection strings. You click into a workspace, the Lakehouse tables are ready for you, and you can immediately experiment, sample data, engineer features, and build holdout sets without copying files around the org. You’re not hunting for which cluster has access, or figuring out what secrets.json someone left behind. The workflow moves the way you expect: explore, code, iterate.Let’s say you’re a data scientist actually kicking off a new experiment. You spin up a new notebook inside your team’s Fabric workspace. You need last quarter’s sales, customer feedback scores, and inventory turns. All of that is sitting right there in the Lakehouse, live and ready. You pull in your sales data, engineer some features—maybe encode product categories and join the support tickets—then train a quick baseline model. It’s familiar, but minus the usual overhead: nobody comes knocking about missing files, there’s no scramble to reconfigure your environment, and you’re not s

Become a supporter of this podcast: https://www.spreaker.com/podcast/m365-show-podcast--6704921/support.

...more

View all episodes

By Mirko

August 08, 2025

Building Machine Learning Models in Microsoft Fabric

21 minutes

...more

Share Building Machine Learning Models in Microsoft Fabric

Sign up to save your podcasts

Building Machine Learning Models in Microsoft Fabric

Building Machine Learning Models in Microsoft Fabric