
Sign up to save your podcasts
Or
Produced while being an affiliate at PIBBSS[1]. The work was done initially with funding from a Lightspeed Grant, and then continued while at PIBBSS. Work done in collaboration with @Paul Riechers, @Lucas Teixeira, @Alexander Gietelink Oldenziel, and Sarah Marzen. Paul was a MATS scholar during some portion of this work. Thanks to Paul, Lucas, Alexander, and @Guillaume Corlouer for suggestions on this writeup.
Introduction.What computational structure are we building into LLMs when we train them on next-token prediction? In this post we present evidence that this structure is given by the meta-dynamics of belief updating over hidden states of the data-generating process. We'll explain exactly what this means in the post. We are excited by these results because
---
Outline:
(02:17) Theoretical Framework
(04:25) Do Transformers Learn a Model of the World?
(06:03) The Structure of Belief State Updating
(08:37) The Mixed-State Presentation
(12:34) Experiment and Results
(12:38) Experimental Design
(14:08) The Data-Generating Process and MSP
(16:26) The Results!
(18:06) Limitations and Next Steps
(18:10) Limitations
(20:35) Next Steps
The original text contained 10 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.
Produced while being an affiliate at PIBBSS[1]. The work was done initially with funding from a Lightspeed Grant, and then continued while at PIBBSS. Work done in collaboration with @Paul Riechers, @Lucas Teixeira, @Alexander Gietelink Oldenziel, and Sarah Marzen. Paul was a MATS scholar during some portion of this work. Thanks to Paul, Lucas, Alexander, and @Guillaume Corlouer for suggestions on this writeup.
Introduction.What computational structure are we building into LLMs when we train them on next-token prediction? In this post we present evidence that this structure is given by the meta-dynamics of belief updating over hidden states of the data-generating process. We'll explain exactly what this means in the post. We are excited by these results because
---
Outline:
(02:17) Theoretical Framework
(04:25) Do Transformers Learn a Model of the World?
(06:03) The Structure of Belief State Updating
(08:37) The Mixed-State Presentation
(12:34) Experiment and Results
(12:38) Experimental Design
(14:08) The Data-Generating Process and MSP
(16:26) The Results!
(18:06) Limitations and Next Steps
(18:10) Limitations
(20:35) Next Steps
The original text contained 10 footnotes which were omitted from this narration.
---
First published:
Source:
Narrated by TYPE III AUDIO.
26,446 Listeners
2,389 Listeners
7,910 Listeners
4,136 Listeners
87 Listeners
1,462 Listeners
9,095 Listeners
87 Listeners
389 Listeners
5,432 Listeners
15,174 Listeners
474 Listeners
121 Listeners
75 Listeners
459 Listeners