Machine Learning Street Talk (MLST)

Sepp Hochreiter - LSTM: The Comeback Story?


Listen Later

Sepp Hochreiter, the inventor of LSTM (Long Short-Term Memory) networks – a foundational technology in AI. Sepp discusses his journey, the origins of LSTM, and why he believes his latest work, XLSTM, could be the next big thing in AI, particularly for applications like robotics and industrial simulation. He also shares his controversial perspective on Large Language Models (LLMs) and why reasoning is a critical missing piece in current AI systems.


SPONSOR MESSAGES:

***

CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. Check out their super fast DeepSeek R1 hosting!

https://centml.ai/pricing/


Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich.


Goto https://tufalabs.ai/

***


TRANSCRIPT AND BACKGROUND READING:

https://www.dropbox.com/scl/fi/n1vzm79t3uuss8xyinxzo/SEPPH.pdf?rlkey=fp7gwaopjk17uyvgjxekxrh5v&dl=0


Prof. Sepp Hochreiter

https://www.nx-ai.com/

https://x.com/hochreitersepp

https://scholar.google.at/citations?user=tvUH3WMAAAAJ&hl=en


TOC:

1. LLM Evolution and Reasoning Capabilities

[00:00:00] 1.1 LLM Capabilities and Limitations Debate

[00:03:16] 1.2 Program Generation and Reasoning in AI Systems

[00:06:30] 1.3 Human vs AI Reasoning Comparison

[00:09:59] 1.4 New Research Initiatives and Hybrid Approaches


2. LSTM Technical Architecture

[00:13:18] 2.1 LSTM Development History and Technical Background

[00:20:38] 2.2 LSTM vs RNN Architecture and Computational Complexity

[00:25:10] 2.3 xLSTM Architecture and Flash Attention Comparison

[00:30:51] 2.4 Evolution of Gating Mechanisms from Sigmoid to Exponential


3. Industrial Applications and Neuro-Symbolic AI

[00:40:35] 3.1 Industrial Applications and Fixed Memory Advantages

[00:42:31] 3.2 Neuro-Symbolic Integration and Pi AI Project

[00:46:00] 3.3 Integration of Symbolic and Neural AI Approaches

[00:51:29] 3.4 Evolution of AI Paradigms and System Thinking

[00:54:55] 3.5 AI Reasoning and Human Intelligence Comparison

[00:58:12] 3.6 NXAI Company and Industrial AI Applications


REFS:

[00:00:15] Seminal LSTM paper establishing Hochreiter's expertise (Hochreiter & Schmidhuber)

https://direct.mit.edu/neco/article-abstract/9/8/1735/6109/Long-Short-Term-Memory


[00:04:20] Kolmogorov complexity and program composition limitations (Kolmogorov)

https://link.springer.com/article/10.1007/BF02478259


[00:07:10] Limitations of LLM mathematical reasoning and symbolic integration (Various Authors)

https://www.arxiv.org/pdf/2502.03671


[00:09:05] AlphaGo’s Move 37 demonstrating creative AI (Google DeepMind)

https://deepmind.google/research/breakthroughs/alphago/


[00:10:15] New AI research lab in Zurich for fundamental LLM research (Benjamin Crouzier)

https://tufalabs.ai


[00:19:40] Introduction of xLSTM with exponential gating (Beck, Hochreiter, et al.)

https://arxiv.org/abs/2405.04517


[00:22:55] FlashAttention: fast & memory-efficient attention (Tri Dao et al.)

https://arxiv.org/abs/2205.14135


[00:31:00] Historical use of sigmoid/tanh activation in 1990s (James A. McCaffrey)

https://visualstudiomagazine.com/articles/2015/06/01/alternative-activation-functions.aspx


[00:36:10] Mamba 2 state space model architecture (Albert Gu et al.)

https://arxiv.org/abs/2312.00752


[00:46:00] Austria’s Pi AI project integrating symbolic & neural AI (Hochreiter et al.)

https://www.jku.at/en/institute-of-machine-learning/research/projects/


[00:48:10] Neuro-symbolic integration challenges in language models (Diego Calanzone et al.)

https://openreview.net/forum?id=7PGluppo4k


[00:49:30] JKU Linz’s historical and neuro-symbolic research (Sepp Hochreiter)

https://www.jku.at/en/news-events/news/detail/news/bilaterale-ki-projekt-unter-leitung-der-jku-erhaelt-fwf-cluster-of-excellence/


YT: https://www.youtube.com/watch?v=8u2pW2zZLCs

...more
View all episodesView all episodes
Download on the App Store

Machine Learning Street Talk (MLST)By Machine Learning Street Talk (MLST)

  • 4.7
  • 4.7
  • 4.7
  • 4.7
  • 4.7

4.7

90 ratings


More shows like Machine Learning Street Talk (MLST)

View all
Data Skeptic by Kyle Polich

Data Skeptic

479 Listeners

The a16z Show by Andreessen Horowitz

The a16z Show

1,095 Listeners

Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

Super Data Science: ML & AI Podcast with Jon Krohn

302 Listeners

NVIDIA AI Podcast by NVIDIA

NVIDIA AI Podcast

333 Listeners

Y Combinator Startup Podcast by Y Combinator

Y Combinator Startup Podcast

228 Listeners

Practical AI by Practical AI LLC

Practical AI

204 Listeners

ManifoldOne by Steve Hsu

ManifoldOne

95 Listeners

Google DeepMind: The Podcast by Hannah Fry

Google DeepMind: The Podcast

207 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

517 Listeners

Big Technology Podcast by Alex Kantrowitz

Big Technology Podcast

501 Listeners

No Priors: Artificial Intelligence | Technology | Startups by Conviction

No Priors: Artificial Intelligence | Technology | Startups

130 Listeners

This Day in AI Podcast by Michael Sharkey, Chris Sharkey

This Day in AI Podcast

228 Listeners

AI + a16z by a16z

AI + a16z

36 Listeners

Training Data by Sequoia Capital

Training Data

40 Listeners

Complex Systems with Patrick McKenzie (patio11) by Patrick McKenzie

Complex Systems with Patrick McKenzie (patio11)

134 Listeners