Machine Learning Street Talk (MLST)

Prof. Subbarao Kambhampati - LLMs don't reason, they memorize (ICML2024 2/13)


Listen Later

Prof. Subbarao Kambhampati argues that while LLMs are impressive and useful tools, especially for creative tasks, they have fundamental limitations in logical reasoning and cannot provide guarantees about the correctness of their outputs. He advocates for hybrid approaches that combine LLMs with external verification systems.


MLST is sponsored by Brave:

The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmentated generation. Try it now - get 2,000 free queries monthly at http://brave.com/api.


TOC (sorry the ones baked into the MP3 were wrong apropos due to LLM hallucination!)

[00:00:00] Intro

[00:02:06] Bio

[00:03:02] LLMs are n-gram models on steroids

[00:07:26] Is natural language a formal language?

[00:08:34] Natural language is formal?

[00:11:01] Do LLMs reason?

[00:19:13] Definition of reasoning

[00:31:40] Creativity in reasoning

[00:50:27] Chollet's ARC challenge

[01:01:31] Can we reason without verification?

[01:10:00] LLMs cant solve some tasks

[01:19:07] LLM Modulo framework

[01:29:26] Future trends of architecture

[01:34:48] Future research directions


Youtube version: https://www.youtube.com/watch?v=y1WnHpedi2A


Refs: (we didn't have space for URLs here, check YT video description instead)

  • Can LLMs Really Reason and Plan?
  • On the Planning Abilities of Large Language Models : A Critical Investigation
  • Chain of Thoughtlessness? An Analysis of CoT in Planning
  • On the Self-Verification Limitations of Large Language Models on Reasoning and Planning Tasks
  • LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks
  • Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve
  • "Task Success" is not Enough
  • Partition function (number theory) (Srinivasa Ramanujan and G.H. Hardy's work)
  • Poincaré conjecture
  • Gödel's incompleteness theorems
  • ROT13 (Rotate13, "rotate by 13 places")
  • A Mathematical Theory of Communication (C. E. SHANNON)
  • Sparks of AGI
  • Kambhampati thesis on speech recognition (1983)
  • PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change
  • Explainable human-AI interaction
  • Tree of Thoughts
  • On the Measure of Intelligence (ARC Challenge)
  • Getting 50% (SoTA) on ARC-AGI with GPT-4o (Ryan Greenblatt ARC solution)
  • PROGRAMS WITH COMMON SENSE (John McCarthy) - "AI should be an advice taker program"
  • Original chain of thought paper
  • ICAPS 2024 Keynote: Dale Schuurmans on "Computing and Planning with Large Generative Models" (COT)
  • The Hardware Lottery (Hooker)
  • A Path Towards Autonomous Machine Intelligence (JEPA/LeCun)
  • AlphaGeometry
  • FunSearch
  • Emergent Abilities of Large Language Models
  • Language models are not naysayers (Negation in LLMs)
  • The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"
  • Embracing negative results
  • ...more
    View all episodesView all episodes
    Download on the App Store

    Machine Learning Street Talk (MLST)By Machine Learning Street Talk (MLST)

    • 4.7
    • 4.7
    • 4.7
    • 4.7
    • 4.7

    4.7

    84 ratings


    More shows like Machine Learning Street Talk (MLST)

    View all
    Data Skeptic by Kyle Polich

    Data Skeptic

    480 Listeners

    The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) by Sam Charrington

    The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

    441 Listeners

    Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

    Super Data Science: ML & AI Podcast with Jon Krohn

    295 Listeners

    NVIDIA AI Podcast by NVIDIA

    NVIDIA AI Podcast

    325 Listeners

    Machine Learning Guide by OCDevel

    Machine Learning Guide

    765 Listeners

    Practical AI by Practical AI LLC

    Practical AI

    189 Listeners

    ManifoldOne by Steve Hsu

    ManifoldOne

    87 Listeners

    Google DeepMind: The Podcast by Hannah Fry

    Google DeepMind: The Podcast

    200 Listeners

    Dwarkesh Podcast by Dwarkesh Patel

    Dwarkesh Podcast

    372 Listeners

    No Priors: Artificial Intelligence | Technology | Startups by Conviction

    No Priors: Artificial Intelligence | Technology | Startups

    123 Listeners

    This Day in AI Podcast by Michael Sharkey, Chris Sharkey

    This Day in AI Podcast

    197 Listeners

    Unsupervised Learning by by Redpoint Ventures

    Unsupervised Learning

    40 Listeners

    Latent Space: The AI Engineer Podcast by swyx + Alessio

    Latent Space: The AI Engineer Podcast

    76 Listeners

    The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis by Nathaniel Whittemore

    The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis

    443 Listeners

    Training Data by Sequoia Capital

    Training Data

    36 Listeners