Super Data Science: ML & AI Podcast with Jon Krohn

903: LLM Benchmarks Are Lying to You (And What to Do Instead), with Sinan Ozdemir


Listen Later

Has AI benchmarking reached its limit, and what do we have to fill this gap? Sinan Ozdemir speaks to Jon Krohn about the lack of transparency in training data and the necessity of human-led quality assurance to detect AI hallucinations, when and why to be skeptical of AI benchmarks, and the future of benchmarking agentic and multimodal models.


Additional materials: ⁠⁠⁠⁠⁠www.superdatascience.com/903⁠⁠⁠⁠


This episode is brought to you by Trainium2, the latest AI chip from AWS, by ⁠⁠Adverity, the conversational analytics platform⁠⁠ and by the ⁠⁠Dell AI Factory with NVIDIA⁠⁠.


Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information.


In this episode you will learn:

  • (16:48) Sinan’s new podcast, Practically Intelligent
  • (21:54) What to know about the limits of AI benchmarking
  • (53:22) Alternatives to AI benchmarks
  • (1:01:23) The difficulties in getting a model to recognize its mistakes
  • ...more
    View all episodesView all episodes
    Download on the App Store

    Super Data Science: ML & AI Podcast with Jon KrohnBy Jon Krohn

    • 4.6
    • 4.6
    • 4.6
    • 4.6
    • 4.6

    4.6

    295 ratings


    More shows like Super Data Science: ML & AI Podcast with Jon Krohn

    View all
    Data Skeptic by Kyle Polich

    Data Skeptic

    479 Listeners

    Software Engineering Daily by Software Engineering Daily

    Software Engineering Daily

    624 Listeners

    Talk Python To Me by Michael Kennedy

    Talk Python To Me

    585 Listeners

    NVIDIA AI Podcast by NVIDIA

    NVIDIA AI Podcast

    332 Listeners

    AI Today Podcast by AI & Data Today

    AI Today Podcast

    152 Listeners

    DataFramed by DataCamp

    DataFramed

    269 Listeners

    Practical AI by Practical AI LLC

    Practical AI

    210 Listeners

    The Real Python Podcast by Real Python

    The Real Python Podcast

    142 Listeners

    Machine Learning Street Talk (MLST) by Machine Learning Street Talk (MLST)

    Machine Learning Street Talk (MLST)

    95 Listeners

    No Priors: Artificial Intelligence | Technology | Startups by Conviction

    No Priors: Artificial Intelligence | Technology | Startups

    135 Listeners

    AI Chat: ChatGPT, AI News, Artificial Intelligence, OpenAI, Machine Learning by Jaeden Schafer

    AI Chat: ChatGPT, AI News, Artificial Intelligence, OpenAI, Machine Learning

    152 Listeners

    This Day in AI Podcast by Michael Sharkey, Chris Sharkey

    This Day in AI Podcast

    225 Listeners

    The AI Daily Brief: Artificial Intelligence News and Analysis by Nathaniel Whittemore

    The AI Daily Brief: Artificial Intelligence News and Analysis

    607 Listeners

    AI For Humans: Making Artificial Intelligence Fun & Practical by Kevin Pereira & Gavin Purcell

    AI For Humans: Making Artificial Intelligence Fun & Practical

    272 Listeners

    Training Data by Sequoia Capital

    Training Data

    39 Listeners