Vanishing Gradients

Episode 57: AI Agents and LLM Judges at Scale: Processing Millions of Documents (Without Breaking the Bank)


Listen Later

While many people talk about “agents,” Shreya Shankar (UC Berkeley) has been building the systems that make them reliable. In this episode, she shares how AI agents and LLM judges can be used to process millions of documents accurately and cheaply.

Drawing from work on projects ranging from databases of police misconduct reports to large-scale customer transcripts, Shreya explains the frameworks, error analysis, and guardrails needed to turn flaky LLM outputs into trustworthy pipelines.

We talk through:

  • Treating LLM workflows as ETL pipelines for unstructured text
  • Error analysis: why you need humans reviewing the first 50–100 traces
  • Guardrails like retries, validators, and “gleaning”
  • How LLM judges work — rubrics, pairwise comparisons, and cost trade-offs
  • Cheap vs. expensive models: when to swap for savings
  • Where agents fit in (and where they don’t)
  • If you’ve ever wondered how to move beyond unreliable demos, this episode shows how to scale LLMs to millions of documents — without breaking the bank.

    LINKS

    • Shreya's website
    • DocETL, A system for LLM-powered data processing
    • Upcoming Events on Luma
    • Watch the podcast video on YouTube
    • Shreya's AI evals course, which she teaches with Hamel "Evals" Husain
    • 🎓 Learn more:

      • Hugo's course: Building LLM Applications for Data Scientists and Software Engineershttps://maven.com/s/course/d56067f338
      • ...more
        View all episodesView all episodes
        Download on the App Store

        Vanishing GradientsBy Hugo Bowne-Anderson

        • 5
        • 5
        • 5
        • 5
        • 5

        5

        11 ratings


        More shows like Vanishing Gradients

        View all
        Data Skeptic by Kyle Polich

        Data Skeptic

        477 Listeners

        a16z Podcast by Andreessen Horowitz

        a16z Podcast

        1,083 Listeners

        The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) by Sam Charrington

        The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

        434 Listeners

        Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

        Super Data Science: ML & AI Podcast with Jon Krohn

        301 Listeners

        NVIDIA AI Podcast by NVIDIA

        NVIDIA AI Podcast

        342 Listeners

        DataFramed by DataCamp

        DataFramed

        268 Listeners

        Practical AI by Practical AI LLC

        Practical AI

        211 Listeners

        Google DeepMind: The Podcast by Hannah Fry

        Google DeepMind: The Podcast

        194 Listeners

        Machine Learning Street Talk (MLST) by Machine Learning Street Talk (MLST)

        Machine Learning Street Talk (MLST)

        89 Listeners

        Dwarkesh Podcast by Dwarkesh Patel

        Dwarkesh Podcast

        489 Listeners

        No Priors: Artificial Intelligence | Technology | Startups by Conviction

        No Priors: Artificial Intelligence | Technology | Startups

        131 Listeners

        Latent Space: The AI Engineer Podcast by swyx + Alessio

        Latent Space: The AI Engineer Podcast

        97 Listeners

        AI + a16z by a16z

        AI + a16z

        33 Listeners

        High Signal: Data Science | Career | AI by Delphina

        High Signal: Data Science | Career | AI

        18 Listeners

        OpenAI Podcast by OpenAI

        OpenAI Podcast

        52 Listeners