
Sign up to save your podcasts
Or
Paper authors: Erik Jenner, Shreyas Kapur, Vasil Georgiev, Cameron Allen, Scott Emmons, Stuart Russell
TL;DR: We released a paper with IMO clear evidence of learned look-ahead in a chess-playing network (i.e., the network considers future moves to decide on its current one). This post shows some of our results, and then I describe the original motivation for the project and reflect on how it went. I think the results are interesting from a scientific and perhaps an interpretability perspective, but only mildly useful for AI safety.
Teaser for the results
(This section is copied from our project website. You may want to read it there for animations and interactive elements, then come back here for my reflections.)
Do neural networks learn to implement algorithms involving look-ahead or search in the wild? Or do they only ever learn simple heuristics? We investigate this question for Leela Chess Zero, arguably the [...]
---
Outline:
(00:41) Teaser for the results
(01:20) Setup
(02:18) Activations associated with future moves are crucial
(04:35) Probes can predict future moves
(05:28) More results
(06:19) The origins of this project
(09:05) Theories of change
(10:05) How it went
(12:13) Sidenote: look-ahead vs search
(13:29) Takeaways for interpretability
(13:51) Creating an input distribution using a weaker model
(16:23) We relied on established mech interp tools more than expected
(17:08) Probing for complex things is difficult
(17:57) There are many more angles of attack than time to pursue them all
(19:27) Good infrastructure is extremely helpful
(20:13) Relevance to AI safety
---
First published:
Source:
Narrated by TYPE III AUDIO.
Paper authors: Erik Jenner, Shreyas Kapur, Vasil Georgiev, Cameron Allen, Scott Emmons, Stuart Russell
TL;DR: We released a paper with IMO clear evidence of learned look-ahead in a chess-playing network (i.e., the network considers future moves to decide on its current one). This post shows some of our results, and then I describe the original motivation for the project and reflect on how it went. I think the results are interesting from a scientific and perhaps an interpretability perspective, but only mildly useful for AI safety.
Teaser for the results
(This section is copied from our project website. You may want to read it there for animations and interactive elements, then come back here for my reflections.)
Do neural networks learn to implement algorithms involving look-ahead or search in the wild? Or do they only ever learn simple heuristics? We investigate this question for Leela Chess Zero, arguably the [...]
---
Outline:
(00:41) Teaser for the results
(01:20) Setup
(02:18) Activations associated with future moves are crucial
(04:35) Probes can predict future moves
(05:28) More results
(06:19) The origins of this project
(09:05) Theories of change
(10:05) How it went
(12:13) Sidenote: look-ahead vs search
(13:29) Takeaways for interpretability
(13:51) Creating an input distribution using a weaker model
(16:23) We relied on established mech interp tools more than expected
(17:08) Probing for complex things is difficult
(17:57) There are many more angles of attack than time to pursue them all
(19:27) Good infrastructure is extremely helpful
(20:13) Relevance to AI safety
---
First published:
Source:
Narrated by TYPE III AUDIO.
26,446 Listeners
2,388 Listeners
7,910 Listeners
4,133 Listeners
87 Listeners
1,462 Listeners
9,095 Listeners
87 Listeners
389 Listeners
5,429 Listeners
15,174 Listeners
474 Listeners
121 Listeners
75 Listeners
459 Listeners