First, addressing misconceptions that could come from the title or from the paper's framing as relevant to LLM scaling:
- The model didn't learn from observing many move traces from high-level games. Instead, they trained a 270M-parameter model to map _(text{board state}, text{legal move})_ pairs to Stockfish 16's predicted win probability after playing the move. This can be described as imitating the play of an oracle that reflects Stockfish's ability at 50 milliseconds per move. Then they evaluated a system that made the model's highest-value move for a given position.
- The system resulted in some quirks that required workarounds during play.
- The board states were encoded in FEN notation, which doesn't provide information about which previous board states have occurred; this is relevant in a small number of situations because players can claim an immediate draw when a board state is repeated three times.
- The model is a classifier, so [...]
---
https://www.lesswrong.com/posts/PXRi9FMrJjyBcEA3r/skepticism-about-deepmind-s-grandmaster-level-chess-without
Narrated by TYPE III AUDIO.