Super Data Science: ML & AI Podcast with Jon Krohn

903: LLM Benchmarks Are Lying to You (And What to Do Instead), with Sinan Ozdemir


Listen Later

Has AI benchmarking reached its limit, and what do we have to fill this gap? Sinan Ozdemir speaks to Jon Krohn about the lack of transparency in training data and the necessity of human-led quality assurance to detect AI hallucinations, when and why to be skeptical of AI benchmarks, and the future of benchmarking agentic and multimodal models.


Additional materials: ⁠⁠⁠⁠⁠www.superdatascience.com/903⁠⁠⁠⁠


This episode is brought to you by Trainium2, the latest AI chip from AWS, by ⁠⁠Adverity, the conversational analytics platform⁠⁠ and by the ⁠⁠Dell AI Factory with NVIDIA⁠⁠.


Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information.


In this episode you will learn:

  • (16:48) Sinan’s new podcast, Practically Intelligent
  • (21:54) What to know about the limits of AI benchmarking
  • (53:22) Alternatives to AI benchmarks
  • (1:01:23) The difficulties in getting a model to recognize its mistakes
  • ...more
    View all episodesView all episodes
    Download on the App Store

    Super Data Science: ML & AI Podcast with Jon KrohnBy Jon Krohn

    • 4.6
    • 4.6
    • 4.6
    • 4.6
    • 4.6

    4.6

    290 ratings


    More shows like Super Data Science: ML & AI Podcast with Jon Krohn

    View all
    Data Skeptic by Kyle Polich

    Data Skeptic

    480 Listeners

    Talk Python To Me by Michael Kennedy

    Talk Python To Me

    590 Listeners

    The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) by Sam Charrington

    The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

    441 Listeners

    NVIDIA AI Podcast by NVIDIA

    NVIDIA AI Podcast

    331 Listeners

    Data Engineering Podcast by Tobias Macey

    Data Engineering Podcast

    140 Listeners

    Machine Learning Guide by OCDevel

    Machine Learning Guide

    763 Listeners

    AI Today Podcast by AI & Data Today

    AI Today Podcast

    156 Listeners

    DataFramed by DataCamp

    DataFramed

    267 Listeners

    Practical AI by Practical AI LLC

    Practical AI

    192 Listeners

    Machine Learning Street Talk (MLST) by Machine Learning Street Talk (MLST)

    Machine Learning Street Talk (MLST)

    88 Listeners

    AI Chat: ChatGPT & AI News, Artificial Intelligence, OpenAI, Machine Learning by Jaeden Schafer

    AI Chat: ChatGPT & AI News, Artificial Intelligence, OpenAI, Machine Learning

    141 Listeners

    This Day in AI Podcast by Michael Sharkey, Chris Sharkey

    This Day in AI Podcast

    201 Listeners

    Latent Space: The AI Engineer Podcast by swyx + Alessio

    Latent Space: The AI Engineer Podcast

    75 Listeners

    The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis by Nathaniel Whittemore

    The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis

    479 Listeners

    AI For Humans: Making Artificial Intelligence Fun & Practical by Kevin Pereira & Gavin Purcell

    AI For Humans: Making Artificial Intelligence Fun & Practical

    248 Listeners