
Sign up to save your podcasts
Or
A Chess-GPT Linear Emergent World Representation
Introduction.Among the many recent developments in ML, there were two I found interesting and wanted to dig into further. The first was gpt-3.5-turbo-instruct's ability to play chess at 1800 Elo. The fact that an LLM could learn to play chess well from random text scraped off the internet seemed almost magical. The second was Kenneth Li's Emergent World Representations paper. There is an excellent summary on The Gradient and a follow-up from Neel Nanda. In it, they trained a 25 million parameter GPT to predict the next character in an Othello game. It learns to accurately make moves in games unseen in its training dataset, and using both non-linear and linear probes it was found that the model accurately tracks the state of the board.
However, this only worked for a model trained on a synthetic [...]
---
Outline:
(00:07) A Chess-GPT Linear Emergent World Representation
(02:35) Training Chess GPT
(05:12) Chess-GPTs Internal World Model
(08:46) Probing for latent variables
(12:41) \- I fine-tuned GPT-2 on a 50 / 50 mix of OpenWebText and chess games, and it learned to play chess and continued to output plausible looking text. Maybe theres something interesting to look at there?
---
First published:
Source:
Linkpost URL:
https://adamkarvonen.github.io/machine_learning/2024/01/03/chess-world-models.html
Narrated by TYPE III AUDIO.
A Chess-GPT Linear Emergent World Representation
Introduction.Among the many recent developments in ML, there were two I found interesting and wanted to dig into further. The first was gpt-3.5-turbo-instruct's ability to play chess at 1800 Elo. The fact that an LLM could learn to play chess well from random text scraped off the internet seemed almost magical. The second was Kenneth Li's Emergent World Representations paper. There is an excellent summary on The Gradient and a follow-up from Neel Nanda. In it, they trained a 25 million parameter GPT to predict the next character in an Othello game. It learns to accurately make moves in games unseen in its training dataset, and using both non-linear and linear probes it was found that the model accurately tracks the state of the board.
However, this only worked for a model trained on a synthetic [...]
---
Outline:
(00:07) A Chess-GPT Linear Emergent World Representation
(02:35) Training Chess GPT
(05:12) Chess-GPTs Internal World Model
(08:46) Probing for latent variables
(12:41) \- I fine-tuned GPT-2 on a 50 / 50 mix of OpenWebText and chess games, and it learned to play chess and continued to output plausible looking text. Maybe theres something interesting to look at there?
---
First published:
Source:
Linkpost URL:
https://adamkarvonen.github.io/machine_learning/2024/01/03/chess-world-models.html
Narrated by TYPE III AUDIO.
26,462 Listeners
2,389 Listeners
7,910 Listeners
4,136 Listeners
87 Listeners
1,462 Listeners
9,095 Listeners
87 Listeners
389 Listeners
5,438 Listeners
15,220 Listeners
475 Listeners
121 Listeners
75 Listeners
461 Listeners