Machine Learning Street Talk (MLST)

GSMSymbolic paper - Iman Mirzadeh (Apple)


Listen Later

Iman Mirzadeh from Apple, who recently published the GSM-Symbolic paper discusses the crucial distinction between intelligence and achievement in AI systems. He critiques current AI research methodologies, highlighting the limitations of Large Language Models (LLMs) in reasoning and knowledge representation.


SPONSOR MESSAGES:

***

Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.


Goto https://tufalabs.ai/

***


TRANSCRIPT + RESEARCH:

https://www.dropbox.com/scl/fi/mlcjl9cd5p1kem4l0vqd3/IMAN.pdf?rlkey=dqfqb74zr81a5gqr8r6c8isg3&dl=0


TOC:

1. Intelligence vs Achievement in AI Systems

[00:00:00] 1.1 Intelligence vs Achievement Metrics in AI Systems

[00:03:27] 1.2 AlphaZero and Abstract Understanding in Chess

[00:10:10] 1.3 Language Models and Distribution Learning Limitations

[00:14:47] 1.4 Research Methodology and Theoretical Frameworks


2. Intelligence Measurement and Learning

[00:24:24] 2.1 LLM Capabilities: Interpolation vs True Reasoning

[00:29:00] 2.2 Intelligence Definition and Measurement Approaches

[00:34:35] 2.3 Learning Capabilities and Agency in AI Systems

[00:39:26] 2.4 Abstract Reasoning and Symbol Understanding


3. LLM Performance and Evaluation

[00:47:15] 3.1 Scaling Laws and Fundamental Limitations

[00:54:33] 3.2 Connectionism vs Symbolism Debate in Neural Networks

[00:58:09] 3.3 GSM-Symbolic: Testing Mathematical Reasoning in LLMs

[01:08:38] 3.4 Benchmark Evaluation and Model Performance Assessment


REFS:

[00:01:00] AlphaZero chess AI system, Silver et al.

https://arxiv.org/abs/1712.01815

[00:07:10] Game Changer: AlphaZero's Groundbreaking Chess Strategies, Sadler & Regan

https://www.amazon.com/Game-Changer-AlphaZeros-Groundbreaking-Strategies/dp/9056918184

[00:11:35] Cross-entropy loss in language modeling, Voita

http://lena-voita.github.io/nlp_course/language_modeling.html

[00:17:20] GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in LLMs, Mirzadeh et al.

https://arxiv.org/abs/2410.05229

[00:21:25] Connectionism and Cognitive Architecture: A Critical Analysis, Fodor & Pylyshyn

https://www.sciencedirect.com/science/article/pii/001002779090014B

[00:28:55] Brain-to-body mass ratio scaling laws, Sutskever

https://www.theverge.com/2024/12/13/24320811/what-ilya-sutskever-sees-openai-model-data-training

[00:29:40] On the Measure of Intelligence, Chollet

https://arxiv.org/abs/1911.01547

[00:33:30] On definition of intelligence, Gignac et al.

https://www.sciencedirect.com/science/article/pii/S0160289624000266

[00:35:30] Defining intelligence, Wang

https://cis.temple.edu/~wangp/papers.html

[00:37:40] How We Learn: Why Brains Learn Better Than Any Machine... for Now, Dehaene

https://www.amazon.com/How-We-Learn-Brains-Machine/dp/0525559884

[00:39:35] Surfaces and Essences: Analogy as the Fuel and Fire of Thinking, Hofstadter and Sander

https://www.amazon.com/Surfaces-Essences-Analogy-Fuel-Thinking/dp/0465018475

[00:43:15] Chain-of-thought prompting, Wei et al.

https://arxiv.org/abs/2201.11903

[00:47:20] Test-time scaling laws in machine learning, Brown

https://podcasts.apple.com/mv/podcast/openais-noam-brown-ilge-akkaya-and-hunter-lightman-on/id1750736528?i=1000671532058

[00:47:50] Scaling Laws for Neural Language Models, Kaplan et al.

https://arxiv.org/abs/2001.08361

[00:55:15] Tensor product variable binding, Smolensky

https://www.sciencedirect.com/science/article/abs/pii/000437029090007M

[01:08:45] GSM-8K dataset, OpenAI

https://huggingface.co/datasets/openai/gsm8k

...more
View all episodesView all episodes
Download on the App Store

Machine Learning Street Talk (MLST)By Machine Learning Street Talk (MLST)

  • 4.7
  • 4.7
  • 4.7
  • 4.7
  • 4.7

4.7

83 ratings


More shows like Machine Learning Street Talk (MLST)

View all
Data Skeptic by Kyle Polich

Data Skeptic

475 Listeners

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) by Sam Charrington

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

439 Listeners

Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

Super Data Science: ML & AI Podcast with Jon Krohn

295 Listeners

NVIDIA AI Podcast by NVIDIA

NVIDIA AI Podcast

312 Listeners

Practical AI by Practical AI LLC

Practical AI

196 Listeners

Google DeepMind: The Podcast by Hannah Fry

Google DeepMind: The Podcast

187 Listeners

Last Week in AI by Skynet Today

Last Week in AI

271 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

320 Listeners

No Priors: Artificial Intelligence | Technology | Startups by Conviction

No Priors: Artificial Intelligence | Technology | Startups

106 Listeners

This Day in AI Podcast by Michael Sharkey, Chris Sharkey

This Day in AI Podcast

178 Listeners

Latent Space: The AI Engineer Podcast by swyx + Alessio

Latent Space: The AI Engineer Podcast

70 Listeners

"Upstream" with Erik Torenberg by Erik Torenberg

"Upstream" with Erik Torenberg

68 Listeners

The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis by Nathaniel Whittemore

The AI Daily Brief (Formerly The AI Breakdown): Artificial Intelligence News and Analysis

397 Listeners

AI + a16z by a16z

AI + a16z

26 Listeners

Training Data by Sequoia Capital

Training Data

31 Listeners