Vanishing Gradients

Episode 53: Human-Seeded Evals & Self-Tuning Agents: Samuel Colvin on Shipping Reliable LLMs


Listen Later

Demos are easy; durability is hard. Samuel Colvin has spent a decade building guardrails in Python (first with Pydantic, now with Logfire), and he’s convinced most LLM failures have nothing to do with the model itself. They appear where the data is fuzzy, the prompts drift, or no one bothered to measure real-world behavior. Samuel joins me to show how a sprinkle of engineering discipline keeps those failures from ever reaching users.
We talk through:
• Tiny labels, big leverage: how five thumbs-ups/thumbs-downs are enough for Logfire to build a rubric that scores every call in real time
• Drift alarms, not dashboards: catching the moment your prompt or data shifts instead of reading charts after the fact
• Prompt self-repair: a prototype agent that rewrites its own system prompt—and tells you when it still doesn’t have what it needs
• The hidden cost curve: why the last 15 percent of reliability costs far more than the flashy 85 percent demo
• Business-first metrics: shipping features that meet real goals instead of chasing another decimal point of “accuracy”
If you’re past the proof-of-concept stage and staring down the “now it has to work” cliff, this episode is your climbing guide.
LINKS
Pydantic (https://pydantic.dev/)
Logfire (https://pydantic.dev/logfire)
Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk)
Hugo's recent newsletter about upcoming events and more! (https://hugobowne.substack.com/p/stop-building-agents)
🎓 Learn more:
Hugo's course: Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/s/course/d56067f338) — next cohort starts July 8: https://maven.com/s/course/d56067f338
📺 Watch the video version on YouTube: YouTube link (https://youtube.com/live/wk6rPZ6qJSY?feature=share)

Get full access to Vanishing Gradients at hugobowne.substack.com/subscribe
...more
View all episodesView all episodes
Download on the App Store

Vanishing GradientsBy Hugo Bowne-Anderson

  • 5
  • 5
  • 5
  • 5
  • 5

5

12 ratings


More shows like Vanishing Gradients

View all
Odd Lots by Bloomberg

Odd Lots

1,993 Listeners

Conversations with Tyler by Mercatus Center at George Mason University

Conversations with Tyler

2,461 Listeners

The a16z Show by Andreessen Horowitz

The a16z Show

1,105 Listeners

Talk Python To Me by Michael Kennedy

Talk Python To Me

583 Listeners

Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

Super Data Science: ML & AI Podcast with Jon Krohn

306 Listeners

Practical AI by Practical AI LLC

Practical AI

212 Listeners

Last Week in AI by Skynet Today

Last Week in AI

313 Listeners

Machine Learning Street Talk (MLST) by Machine Learning Street Talk (MLST)

Machine Learning Street Talk (MLST)

101 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

551 Listeners

No Priors: Artificial Intelligence | Technology | Startups by Conviction

No Priors: Artificial Intelligence | Technology | Startups

150 Listeners

Latent Space: The AI Engineer Podcast by Latent.Space

Latent Space: The AI Engineer Podcast

101 Listeners

The AI Daily Brief: Artificial Intelligence News and Analysis by Nathaniel Whittemore

The AI Daily Brief: Artificial Intelligence News and Analysis

688 Listeners

Sharp Tech with Ben Thompson by Andrew Sharp and Ben Thompson

Sharp Tech with Ben Thompson

97 Listeners

High Signal: Data Science | Career | AI by Delphina

High Signal: Data Science | Career | AI

18 Listeners

OpenAI Podcast by OpenAI

OpenAI Podcast

59 Listeners