Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Chess as a case study in hidden capabilities in ChatGPT, published by AdamYedidia on August 21, 2023 on LessWrong.
There are lots of funny videos of ChatGPT playing chess, and all of them have the same premise: ChatGPT doesn't know how to play chess, but it will cheerfully and confidently make lots of illegal moves, and humoring its blundering attempts to play a game it apparently doesn't understand is great content.
What's less well-known is that ChatGPT actually can play chess when correctly prompted. It plays at around 1000 Elo, and can make consistently legal moves until about 20-30 moves in, when its performance tends to break down. That sounds not-so-impressive, until you consider that it's effectively playing blindfolded, having access to only the game's moves in algebraic notation, and not a visual of a chessboard. I myself have probably spent at least a thousand hours playing chess, and I think I could do slightly better than 1000 Elo for 30 moves when blindfolded, but not by much. ChatGPT's performance is roughly the level of blindfolded chess ability to expect from a decent club player. And 30 moves is more than enough to demonstrate beyond any reasonable doubt that ChatGPT has fully internalized the rules of chess and is not relying on memorization or other, shallower patterns.
The "magic prompt" that I've been using is the following:
1. e4
and then in my later replies, providing the full current game score to ChatGPT as my message to it, e.g.:
2. Nh3 fxe4
3. Nf4 Nf6
4. b4 e5
5. b5
This "magic prompt" isn't original to me - soon after GPT-4 came out, a friend of mine told me about it, having seen it as a comment on HackerNews. (Sorry, anonymous HackerNews commenter - I'd love to credit you further, and will if you find this post and message me.)
The especially interesting thing about this is the sharp contrast between how ChatGPT-3.5 performs with and without the prompt. With the prompt, ChatGPT plays consistently legally and even passably well for the first 30 or so moves; without the prompt, ChatGPT is basically totally unable to play a fully legal game of chess.
Here are a few example games of ChatGPT playing or attempting to play chess under various conditions.
ChatGPT-3.5, with the magic prompt
Playing against me
Lichess study, ChatGPT conversation link
I play white, ChatGPT plays black. In this game, I intentionally play a bizarre opening, in order to quickly prove that ChatGPT isn't relying on memorized opening or ideas in its play. This game isn't meant to show that ChatGPT can play well (since I'm playing atrociously here), only that it can play legally in a novel game. In my view, this game alone is more than enough evidence to put to bed the notion that ChatGPT "doesn't know" the rules of chess or that it's just regurgitating half-remembered ideas from its training set; it very clearly has an internal representation of the board, and fully understands the rules. In order to deliver checkmate on move 19 with 19...Qe8# (which it does deliberately, outputting the pound sign which indicates checkmate), ChatGPT needed to "see" the contributions of at least six different black pieces at once (the bishop on g4, the two pawns on g7 and h6, the king on f8, the queen on e8, and either the rook on h8 or the knight on f6).
Playing against Lichess Stockfish Level 1
Lichess game, ChatGPT conversation link
Stockfish level 1 has an Elo of around 850. Stockfish is playing white and ChatGPT is playing black. In this game, ChatGPT quickly gains a dominating material advantage and checkmates Stockfish Level 1 on move 22.
Playing against Lichess Stockfish Level 2
Lichess game, ChatGPT conversation link
Stockfish level 2 has an Elo of around 950. Stockfish is playing white and ChatGPT is playing black. In this game, ChatGPT starts a dangerous kingside attack and gai...