Super Data Science: ML & AI Podcast with Jon Krohn

903: LLM Benchmarks Are Lying to You (And What to Do Instead), with Sinan Ozdemir


Listen Later

Has AI benchmarking reached its limit, and what do we have to fill this gap? Sinan Ozdemir speaks to Jon Krohn about the lack of transparency in training data and the necessity of human-led quality assurance to detect AI hallucinations, when and why to be skeptical of AI benchmarks, and the future of benchmarking agentic and multimodal models.


Additional materials: ⁠⁠⁠⁠⁠www.superdatascience.com/903⁠⁠⁠⁠


This episode is brought to you by Trainium2, the latest AI chip from AWS, by ⁠⁠Adverity, the conversational analytics platform⁠⁠ and by the ⁠⁠Dell AI Factory with NVIDIA⁠⁠.


Interested in sponsoring a SuperDataScience Podcast episode? Email [email protected] for sponsorship information.


In this episode you will learn:

  • (16:48) Sinan’s new podcast, Practically Intelligent
  • (21:54) What to know about the limits of AI benchmarking
  • (53:22) Alternatives to AI benchmarks
  • (1:01:23) The difficulties in getting a model to recognize its mistakes
  • ...more
    View all episodesView all episodes
    Download on the App Store

    Super Data Science: ML & AI Podcast with Jon KrohnBy Jon Krohn

    • 4.6
    • 4.6
    • 4.6
    • 4.6
    • 4.6

    4.6

    294 ratings


    More shows like Super Data Science: ML & AI Podcast with Jon Krohn

    View all
    Data Skeptic by Kyle Polich

    Data Skeptic

    478 Listeners

    Talk Python To Me by Michael Kennedy

    Talk Python To Me

    588 Listeners

    The AI in Business Podcast by Daniel Faggella

    The AI in Business Podcast

    169 Listeners

    The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) by Sam Charrington

    The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

    433 Listeners

    NVIDIA AI Podcast by NVIDIA

    NVIDIA AI Podcast

    341 Listeners

    Data Engineering Podcast by Tobias Macey

    Data Engineering Podcast

    146 Listeners

    Machine Learning Guide by OCDevel

    Machine Learning Guide

    768 Listeners

    DataFramed by DataCamp

    DataFramed

    268 Listeners

    Practical AI by Practical AI LLC

    Practical AI

    211 Listeners

    The Real Python Podcast by Real Python

    The Real Python Podcast

    142 Listeners

    Machine Learning Street Talk (MLST) by Machine Learning Street Talk (MLST)

    Machine Learning Street Talk (MLST)

    88 Listeners

    No Priors: Artificial Intelligence | Technology | Startups by Conviction

    No Priors: Artificial Intelligence | Technology | Startups

    133 Listeners

    AI Chat: ChatGPT & AI News, Artificial Intelligence, OpenAI, Machine Learning by Jaeden Schafer

    AI Chat: ChatGPT & AI News, Artificial Intelligence, OpenAI, Machine Learning

    150 Listeners

    This Day in AI Podcast by Michael Sharkey, Chris Sharkey

    This Day in AI Podcast

    209 Listeners

    The AI Daily Brief: Artificial Intelligence News and Analysis by Nathaniel Whittemore

    The AI Daily Brief: Artificial Intelligence News and Analysis

    557 Listeners