
Sign up to save your podcasts
Or
Work performed as a part of Neel Nanda's MATS 6.0 (Summer 2024) training program.
TLDR
This is an interim report on reverse-engineering Othello-GPT, an 8-layer transformer trained to take sequences of Othello moves and predict legal moves. We find evidence that Othello-GPT learns to compute the board state using many independent decision rules that are localized to small parts of the board. Though we cannot rule out that it also learns a single succinct algorithm in addition to these rules, our best guess is that Othello-GPT's learned algorithm is just a bag of independent heuristics.
Board state reconstruction
---
Outline:
(00:19) TLDR
(02:18) Review of Othello-GPT
(04:02) Project goal
(04:33) Results on box #1: Board reconstruction
(04:39) A circuit for how the model computes if a cell is blank or not blank
(05:59) An example of a logical rule for how the model computes if a cell is “mine” or “yours”
(07:32) Intra-layer phenomenology
(08:49) Results on box #2: Valid move prediction
(08:55) Direct logit attribution (Logit Lens)
(10:28) Board Pattern Neurons
(13:57) Clock Neurons
(15:14) Suppression behavior
(16:22) Future Work
(18:11) Acknowledgements
The original text contained 10 footnotes which were omitted from this narration.
The original text contained 2 images which were described by AI.
---
First published:
Source:
Narrated by TYPE III AUDIO.
Work performed as a part of Neel Nanda's MATS 6.0 (Summer 2024) training program.
TLDR
This is an interim report on reverse-engineering Othello-GPT, an 8-layer transformer trained to take sequences of Othello moves and predict legal moves. We find evidence that Othello-GPT learns to compute the board state using many independent decision rules that are localized to small parts of the board. Though we cannot rule out that it also learns a single succinct algorithm in addition to these rules, our best guess is that Othello-GPT's learned algorithm is just a bag of independent heuristics.
Board state reconstruction
---
Outline:
(00:19) TLDR
(02:18) Review of Othello-GPT
(04:02) Project goal
(04:33) Results on box #1: Board reconstruction
(04:39) A circuit for how the model computes if a cell is blank or not blank
(05:59) An example of a logical rule for how the model computes if a cell is “mine” or “yours”
(07:32) Intra-layer phenomenology
(08:49) Results on box #2: Valid move prediction
(08:55) Direct logit attribution (Logit Lens)
(10:28) Board Pattern Neurons
(13:57) Clock Neurons
(15:14) Suppression behavior
(16:22) Future Work
(18:11) Acknowledgements
The original text contained 10 footnotes which were omitted from this narration.
The original text contained 2 images which were described by AI.
---
First published:
Source:
Narrated by TYPE III AUDIO.
26,434 Listeners
2,388 Listeners
7,906 Listeners
4,133 Listeners
87 Listeners
1,462 Listeners
9,095 Listeners
87 Listeners
389 Listeners
5,429 Listeners
15,174 Listeners
474 Listeners
121 Listeners
75 Listeners
459 Listeners