Vanishing Gradients

By Hugo Bowne-Anderson

A podcast about all things data, brought to you by data scientist Hugo Bowne-Anderson.

It's time for more critical conversations about the challenges in our industry in order to build better compasses... more

· Technology

5

1111 ratings

Download on the App Store

Download on the App Store

Get it on Google Play

FAQs about Vanishing Gradients:

How many episodes does Vanishing Gradients have?

The podcast currently has 61 episodes available.

Vanishing Gradients episodes:

October 16, 2025Episode 61: The AI Agent Reliability Cliff: What Happens When Tools Fail in Production
Most AI teams find their multi-agent systems devolving into chaos, but ML Engineer Alex Strick van Linschoten argues they are ignoring the production reality. In this episode, he draws on insights from the LLM Ops Database (750+ real-world deployments then; now nearly 1,000!) to systematically measure and engineer constraint, turning unreliable prototypes into robust, enterprise-ready AI.
Drawing from his work at Zen ML, Alex details why success requires scaling down and enforcing MLOps discipline to navigate the unpredictable "Agent Reliability Cliff". He provides the essential architectural shifts, evaluation hygiene techniques, and practical steps needed to move beyond guesswork and build scalable, trustworthy AI products.
We talk through:
Why "shoving a thousand agents" into an app is the fastest route to unmanageable chaos
The essential MLOps hygiene (tracing and continuous evals) that most teams skip
The optimal (and very low) limit for the number of tools an agent can reliably use
How to use human-in-the-loop strategies to manage the risk of autonomous failure in high-sensitivity domains
The principle of using simple Python/RegEx before resorting to costly LLM judges
LINKS
The LLMOps Database: 925 entries as of today....submit a use case to help it get to 1K!
Upcoming Events on Luma
Watch the podcast video on YouTube
🎓 Learn more:
-This was a guest Q&A from Building LLM Applications for Data Scientists and Software Engineers — https://maven.com/hugo-stefan/building-llm-apps-ds-and-swe-from-first-principles?promoCode=AI20
Next cohort starts November 3: come build with us!
...more
29min
September 30, 2025Episode 60: 10 Things I Hate About AI Evals with Hamel Husain
Most AI teams find "evals" frustrating, but ML Engineer Hamel Husain argues they’re just using the wrong playbook. In this episode, he lays out a data-centric approach to systematically measure and improve AI, turning unreliable prototypes into robust, production-ready systems.
Drawing from his experience getting countless teams unstuck, Hamel explains why the solution requires a "revenge of the data scientists." He details the essential mindset shifts, error analysis techniques, and practical steps needed to move beyond guesswork and build AI products you can actually trust.
We talk through:
The 10(+1) critical mistakes that cause teams to waste time on evals
Why "hallucination scores" are a waste of time (and what to measure instead)
The manual review process that finds major issues in hours, not weeks
A step-by-step method for building LLM judges you can actually trust
How to use domain experts without getting stuck in endless review committees
Guest Bryan Bischof's "Failure as a Funnel" for debugging complex AI agents
If you're tired of ambiguous "vibe checks" and want a clear process that delivers real improvement, this episode provides the definitive roadmap.
LINKS
Hamel's website and blog
Hugo speaks with Philip Carter (Honeycomb) about aligning your LLM-as-a-judge with your domain expertise
Hamel Husain on Lenny's pocast, which includes a live demo of error analysis
The episode of VG in which Hamel and Hugo talk about Hamel's "data consulting in Vegas" era
Upcoming Events on Luma
Watch the podcast video on YouTube
Hamel's AI evals course, which he teaches with Shreya Shankar (UC Berkeley): starts Oct 6 and this link gives 35% off! https://maven.com/parlance-labs/evals?promoCode=GOHUGORGOHOME
🎓 Learn more:
Hugo's course: Building LLM Applications for Data Scientists and Software Engineers — https://maven.com/s/course/d56067f338
...more
1h 14min
September 23, 2025Episode 59: Patterns and Anti-Patterns For Building with AI
John Berryman (Arcturus Labs; early GitHub Copilot engineer; co-author of Relevant Search and Prompt Engineering for LLMs) has spent years figuring out what makes AI applications actually work in production. In this episode, he shares the “seven deadly sins” of LLM development — and the practical fixes that keep projects from stalling.
From context management to retrieval debugging, John explains the patterns he’s seen succeed, the mistakes to avoid, and why it helps to think of an LLM as an “AI intern” rather than an all-knowing oracle.
We talk through:
Why chasing perfect accuracy is a dead end
How to use agents without losing control
Context engineering: fitting the right information in the window
Starting simple instead of over-orchestrating
Separating retrieval from generation in RAG
Splitting complex extractions into smaller checks
Knowing when frameworks help — and when they slow you down
A practical guide to avoiding the common traps of LLM development and building systems that actually hold up in production.
LINKS:
Context Engineering for AI Agents, a free, upcoming lightning lesson from John and Hugo
The Hidden Simplicity of GenAI Systems, a previous lightning lesson from John and Hugo
Roaming RAG – RAG without the Vector Database, by John
Cut the Chit-Chat with Artifacts, by John
Prompt Engineering for LLMs by John and Albert Ziegler
Relevant Search by John and Doug Turnbull
Arcturus Labs
Watch the podcast on YouTube
Upcoming Events on Luma
🎓 Learn more:
Hugo's course (this episode was a guest Q&A from the course): Building LLM Applications for Data Scientists and Software Engineers — https://maven.com/s/course/d56067f338
...more
48min
September 09, 2025Episode 58: Building GenAI Systems That Make Business Decisions with Thomas Wiecki (PyMC Labs)
While most conversations about generative AI focus on chatbots, Thomas Wiecki (PyMC Labs, PyMC) has been building systems that help companies make actual business decisions. In this episode, he shares how Bayesian modeling and synthetic consumers can be combined with LLMs to simulate customer reactions, guide marketing spend, and support strategy.
Drawing from his work with Colgate and others, Thomas explains how to scale survey methods with AI, where agents fit into analytics workflows, and what it takes to make these systems reliable.
We talk through:
Using LLMs as “synthetic consumers” to simulate surveys and test product ideas
How Bayesian modeling and causal graphs enable transparent, trustworthy decision-making
Building closed-loop systems where AI generates and critiques ideas
Guardrails for multi-agent workflows in marketing mix modeling
Where generative AI breaks (and how to detect failure modes)
The balance between useful models and “correct” models
If you’ve ever wondered how to move from flashy prototypes to AI systems that actually inform business strategy, this episode shows what it takes.
LINKS:
The AI MMM Agent, An AI-Powered Shortcut to Bayesian Marketing Mix Insights
AI-Powered Decision Making Under Uncertainty Workshop w/ Allen Downey & Chris Fonnesbeck (PyMC Labs)
The Podcast livestream on YouTube
Upcoming Events on Luma
🎓 Learn more:
Hugo's course: Building LLM Applications for Data Scientists and Software Engineers — https://maven.com/s/course/d56067f338
...more
1h 1min
August 29, 2025Episode 57: AI Agents and LLM Judges at Scale: Processing Millions of Documents (Without Breaking the Bank)
While many people talk about “agents,” Shreya Shankar (UC Berkeley) has been building the systems that make them reliable. In this episode, she shares how AI agents and LLM judges can be used to process millions of documents accurately and cheaply.
Drawing from work on projects ranging from databases of police misconduct reports to large-scale customer transcripts, Shreya explains the frameworks, error analysis, and guardrails needed to turn flaky LLM outputs into trustworthy pipelines.
We talk through:
Treating LLM workflows as ETL pipelines for unstructured text
Error analysis: why you need humans reviewing the first 50–100 traces
Guardrails like retries, validators, and “gleaning”
How LLM judges work — rubrics, pairwise comparisons, and cost trade-offs
Cheap vs. expensive models: when to swap for savings
Where agents fit in (and where they don’t)
If you’ve ever wondered how to move beyond unreliable demos, this episode shows how to scale LLMs to millions of documents — without breaking the bank.
LINKS
Shreya's website
DocETL, A system for LLM-powered data processing
Upcoming Events on Luma
Watch the podcast video on YouTube
Shreya's AI evals course, which she teaches with Hamel "Evals" Husain
🎓 Learn more:
Hugo's course: Building LLM Applications for Data Scientists and Software Engineers — https://maven.com/s/course/d56067f338
...more
42min
August 14, 2025Episode 56: DeepMind Just Dropped Gemma 270M... And Here’s Why It Matters
While much of the AI world chases ever-larger models, Ravin Kumar (Google DeepMind) and his team build across the size spectrum, from billions of parameters down to this week’s release: Gemma 270M, the smallest member yet of the Gemma 3 open-weight family. At just 270 million parameters, a quarter the size of Gemma 1B, it’s designed for speed, efficiency, and fine-tuning.
We explore what makes 270M special, where it fits alongside its billion-parameter siblings, and why you might reach for it in production even if you think “small” means “just for experiments.”
We talk through:
Where 270M fits into the Gemma 3 lineup — and why it exists
On-device use cases where latency, privacy, and efficiency matter
How smaller models open up rapid, targeted fine-tuning
Running multiple models in parallel without heavyweight hardware
Why “small” models might drive the next big wave of AI adoption
If you’ve ever wondered what you’d do with a model this size (or how to squeeze the most out of it) this episode will show you how small can punch far above its weight.
LINKS
Introducing Gemma 3 270M: The compact model for hyper-efficient AI (Google Developer Blog)
Full Model Fine-Tune Guide using Hugging Face Transformers
The Gemma 270M model on HuggingFace
The Gemma 270M model on Ollama
Building AI Agents with Gemma 3, a workshop with Ravin and Hugo (Code here)
From Images to Agents: Building and Evaluating Multimodal AI Workflows, a workshop with Ravin and Hugo(Code here)
Evaluating AI Agents: From Demos to Dependability, an upcoming workshop with Ravin and Hugo
Upcoming Events on Luma
Watch the podcast video on YouTube
🎓 Learn more:
Hugo's course: Building LLM Applications for Data Scientists and Software Engineers — https://maven.com/s/course/d56067f338 ($600 off early bird discount for November cohort availiable until August 16)
...more
46min
August 12, 2025Episode 55: From Frittatas to Production LLMs: Breakfast at SciPy
Traditional software expects 100% passing tests. In LLM-powered systems, that’s not just unrealistic — it’s a feature, not a bug. Eric Ma leads research data science in Moderna’s data science and AI group, and over breakfast at SciPy we explored why AI products break the old rules, what skills different personas bring (and miss), and how to keep systems alive after the launch hype fades.
You’ll hear the clink of coffee cups, the murmur of SciPy in the background, and the occasional bite of frittata as we talk (hopefully also a feature, not a bug!)
We talk through:

• The three personas — and the blind spots each has when shipping AI systems

• Why “perfect” tests can be a sign you’re testing the wrong thing

• Development vs. production observability loops — and why you need both

• How curiosity about failing data separates good builders from great ones

• Ways large organizations can create space for experimentation without losing delivery focus
If you want to build AI products that thrive in the messy real world, this episode will help you embrace the chaos — and make it work for you.
LINKS
Eric' Website
More about the workshops Eric and Hugo taught at SciPy
Upcoming Events on Luma
🎓 Learn more:
Hugo's course: Building LLM Applications for Data Scientists and Software Engineers — https://maven.com/s/course/d56067f338 ($600 off early bird discount for November cohort availiable until August 16)
...more
39min
July 18, 2025Episode 54: Scaling AI: From Colab to Clusters — A Practitioner’s Guide to Distributed Training and Inference
Colab is cozy. But production won’t fit on a single GPU.
Zach Mueller leads Accelerate at Hugging Face and spends his days helping people go from solo scripts to scalable systems. In this episode, he joins me to demystify distributed training and inference — not just for research labs, but for any ML engineer trying to ship real software.
We talk through:
• From Colab to clusters: why scaling isn’t just about training massive models, but serving agents, handling load, and speeding up iteration
• Zero-to-two GPUs: how to get started without Kubernetes, Slurm, or a PhD in networking
• Scaling tradeoffs: when to care about interconnects, which infra bottlenecks actually matter, and how to avoid chasing performance ghosts
• The GPU middle class: strategies for training and serving on a shoestring, with just a few cards or modest credits
• Local experiments, global impact: why learning distributed systems—even just a little—can set you apart as an engineer
If you’ve ever stared at a Hugging Face training script and wondered how to run it on something more than your laptop: this one’s for you.
LINKS
Zach on LinkedIn
Hugo's blog post on Stop Buliding AI Agents
Upcoming Events on Luma
Hugo's recent newsletter about upcoming events and more!
🎓 Learn more:
Hugo's course: Building LLM Applications for Data Scientists and Software Engineers — https://maven.com/s/course/d56067f338
Zach's course (45% off for VG listeners!): Scratch to Scale: Large-Scale Training in the Modern World -- https://maven.com/walk-with-code/scratch-to-scale?promoCode=hugo39
📺 Watch the video version on YouTube: YouTube link
...more
42min
July 08, 2025Episode 53: Human-Seeded Evals & Self-Tuning Agents: Samuel Colvin on Shipping Reliable LLMs
Demos are easy; durability is hard. Samuel Colvin has spent a decade building guardrails in Python (first with Pydantic, now with Logfire), and he’s convinced most LLM failures have nothing to do with the model itself. They appear where the data is fuzzy, the prompts drift, or no one bothered to measure real-world behavior. Samuel joins me to show how a sprinkle of engineering discipline keeps those failures from ever reaching users.
We talk through:
• Tiny labels, big leverage: how five thumbs-ups/thumbs-downs are enough for Logfire to build a rubric that scores every call in real time
• Drift alarms, not dashboards: catching the moment your prompt or data shifts instead of reading charts after the fact
• Prompt self-repair: a prototype agent that rewrites its own system prompt—and tells you when it still doesn’t have what it needs
• The hidden cost curve: why the last 15 percent of reliability costs far more than the flashy 85 percent demo
• Business-first metrics: shipping features that meet real goals instead of chasing another decimal point of “accuracy”
If you’re past the proof-of-concept stage and staring down the “now it has to work” cliff, this episode is your climbing guide.
LINKS
Pydantic
Logfire
Upcoming Events on Luma
Hugo's recent newsletter about upcoming events and more!
🎓 Learn more:
Hugo's course: Building LLM Applications for Data Scientists and Software Engineers — next cohort starts July 8: https://maven.com/s/course/d56067f338
📺 Watch the video version on YouTube: YouTube link
...more
45min
July 02, 2025Episode 52: Why Most LLM Products Break at Retrieval (And How to Fix Them)
Most LLM-powered features do not break at the model. They break at the context. So how do you retrieve the right information to get useful results, even under vague or messy user queries?
In this episode, we hear from Eric Ma, who leads data science research in the Data Science and AI group at Moderna. He shares what it takes to move beyond toy demos and ship LLM features that actually help people do their jobs.
We cover:
• How to align retrieval with user intent and why cosine similarity is not the answer
• How a dumb YAML-based system outperformed so-called smart retrieval pipelines
• Why vague queries like “what is this all about” expose real weaknesses in most systems
• When vibe checks are enough and when formal evaluation is worth the effort
• How retrieval workflows can evolve alongside your product and user needs
If you are building LLM-powered systems and care about how they work, not just whether they work, this one is for you.
LINKS
Eric's website
Upcoming Events on Luma
Hugo's recent newsletter about upcoming events and more!
🎓 Learn more:
Hugo's course: Building LLM Applications for Data Scientists and Software Engineers — next cohort starts July 8: https://maven.com/s/course/d56067f338
📺 Watch the video version on YouTube: YouTube link
...more
29min

FAQs about Vanishing Gradients:

How many episodes does Vanishing Gradients have?

The podcast currently has 61 episodes available.

More shows like Vanishing Gradients

Data Skeptic by Kyle Polich

Data Skeptic

477 Listeners

a16z Podcast by Andreessen Horowitz

a16z Podcast

1,083 Listeners

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) by Sam Charrington

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

434 Listeners

Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

Super Data Science: ML & AI Podcast with Jon Krohn

301 Listeners

NVIDIA AI Podcast by NVIDIA

NVIDIA AI Podcast

342 Listeners

DataFramed by DataCamp

DataFramed

268 Listeners

Practical AI by Practical AI LLC

Practical AI

211 Listeners

Google DeepMind: The Podcast by Hannah Fry

Google DeepMind: The Podcast

194 Listeners

Machine Learning Street Talk (MLST) by Machine Learning Street Talk (MLST)

Machine Learning Street Talk (MLST)

89 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

489 Listeners

No Priors: Artificial Intelligence | Technology | Startups by Conviction

No Priors: Artificial Intelligence | Technology | Startups

131 Listeners

Latent Space: The AI Engineer Podcast by swyx + Alessio

Latent Space: The AI Engineer Podcast

97 Listeners

AI + a16z by a16z

AI + a16z

33 Listeners

High Signal: Data Science | Career | AI by Delphina

High Signal: Data Science | Career | AI

18 Listeners

OpenAI Podcast by OpenAI

OpenAI Podcast

52 Listeners