Vanishing Gradients

Episode 60: 10 Things I Hate About AI Evals with Hamel Husain


Listen Later

Most AI teams find "evals" frustrating, but ML Engineer Hamel Husain argues they’re just using the wrong playbook. In this episode, he lays out a data-centric approach to systematically measure and improve AI, turning unreliable prototypes into robust, production-ready systems.

Drawing from his experience getting countless teams unstuck, Hamel explains why the solution requires a "revenge of the data scientists." He details the essential mindset shifts, error analysis techniques, and practical steps needed to move beyond guesswork and build AI products you can actually trust.

We talk through:

  • The 10(+1) critical mistakes that cause teams to waste time on evals
  • Why "hallucination scores" are a waste of time (and what to measure instead)
  • The manual review process that finds major issues in hours, not weeks
  • A step-by-step method for building LLM judges you can actually trust
  • How to use domain experts without getting stuck in endless review committees
  • Guest Bryan Bischof's "Failure as a Funnel" for debugging complex AI agents
  • If you're tired of ambiguous "vibe checks" and want a clear process that delivers real improvement, this episode provides the definitive roadmap.

    LINKS

    • Hamel's website and blog
    • Hugo speaks with Philip Carter (Honeycomb) about aligning your LLM-as-a-judge with your domain expertise
    • Hamel Husain on Lenny's pocast, which includes a live demo of error analysis
    • The episode of VG in which Hamel and Hugo talk about Hamel's "data consulting in Vegas" era
    • Upcoming Events on Luma
    • Watch the podcast video on YouTube
    • Hamel's AI evals course, which he teaches with Shreya Shankar (UC Berkeley): starts Oct 6 and this link gives 35% off! https://maven.com/parlance-labs/evals?promoCode=GOHUGORGOHOME
    • 🎓 Learn more:

      • Hugo's course: Building LLM Applications for Data Scientists and Software Engineershttps://maven.com/s/course/d56067f338
      • ...more
        View all episodesView all episodes
        Download on the App Store

        Vanishing GradientsBy Hugo Bowne-Anderson

        • 5
        • 5
        • 5
        • 5
        • 5

        5

        11 ratings


        More shows like Vanishing Gradients

        View all
        Data Skeptic by Kyle Polich

        Data Skeptic

        477 Listeners

        a16z Podcast by Andreessen Horowitz

        a16z Podcast

        1,083 Listeners

        The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) by Sam Charrington

        The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

        434 Listeners

        Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

        Super Data Science: ML & AI Podcast with Jon Krohn

        301 Listeners

        NVIDIA AI Podcast by NVIDIA

        NVIDIA AI Podcast

        341 Listeners

        DataFramed by DataCamp

        DataFramed

        268 Listeners

        Practical AI by Practical AI LLC

        Practical AI

        210 Listeners

        Google DeepMind: The Podcast by Hannah Fry

        Google DeepMind: The Podcast

        194 Listeners

        Machine Learning Street Talk (MLST) by Machine Learning Street Talk (MLST)

        Machine Learning Street Talk (MLST)

        89 Listeners

        Dwarkesh Podcast by Dwarkesh Patel

        Dwarkesh Podcast

        489 Listeners

        No Priors: Artificial Intelligence | Technology | Startups by Conviction

        No Priors: Artificial Intelligence | Technology | Startups

        133 Listeners

        Latent Space: The AI Engineer Podcast by swyx + Alessio

        Latent Space: The AI Engineer Podcast

        97 Listeners

        AI + a16z by a16z

        AI + a16z

        33 Listeners

        High Signal: Data Science | Career | AI by Delphina

        High Signal: Data Science | Career | AI

        18 Listeners

        OpenAI Podcast by OpenAI

        OpenAI Podcast

        52 Listeners