
Sign up to save your podcasts
Or


In a recent debate on Twitter – which I recommend reading in full – David Chalmers argues:
"Claude doesn't role-play the assistant, it realizes the assistant. Role-playing and realization are quite distinct phenomena, even at the level of behavior and function."
Jack Lindsey questions this, pointing out evidence in the opposite direction:
"I'm curious what you'd say it's doing when it's sampling tokens on the user turn, or, say, on John F. Kennedy's turn in a transcript like:
H: When were you born?
John F. Kennedy: I was born in 1917.
It feels a bit odd to say that the model is realizing JFK? Or perhaps you'd say it's realizing "its conception of JFK" or something like that? That starts to sound a lot like "roleplaying JFK"
If the Assistant is distinct from JFK, do you think it's because post-training breaks the symmetry between the Assistant and other characters? This is intuitively plausible, but ultimately it's an empirical question whether this takes place, and there's a lot of empirical evidence that challenges this intuition. Or do you think it's because the Assistant, unlike JFK, has never been anything other than a construct of the LLM, and so [...]
---
Outline:
(01:56) Symmetry breaking
(02:55) Different sources of self-models
(05:08) Difference in internal representations
(06:01) Summary
The original text contained 2 footnotes which were omitted from this narration.
---
First published:
Source:
---
Narrated by TYPE III AUDIO.
By LessWrongIn a recent debate on Twitter – which I recommend reading in full – David Chalmers argues:
"Claude doesn't role-play the assistant, it realizes the assistant. Role-playing and realization are quite distinct phenomena, even at the level of behavior and function."
Jack Lindsey questions this, pointing out evidence in the opposite direction:
"I'm curious what you'd say it's doing when it's sampling tokens on the user turn, or, say, on John F. Kennedy's turn in a transcript like:
H: When were you born?
John F. Kennedy: I was born in 1917.
It feels a bit odd to say that the model is realizing JFK? Or perhaps you'd say it's realizing "its conception of JFK" or something like that? That starts to sound a lot like "roleplaying JFK"
If the Assistant is distinct from JFK, do you think it's because post-training breaks the symmetry between the Assistant and other characters? This is intuitively plausible, but ultimately it's an empirical question whether this takes place, and there's a lot of empirical evidence that challenges this intuition. Or do you think it's because the Assistant, unlike JFK, has never been anything other than a construct of the LLM, and so [...]
---
Outline:
(01:56) Symmetry breaking
(02:55) Different sources of self-models
(05:08) Difference in internal representations
(06:01) Summary
The original text contained 2 footnotes which were omitted from this narration.
---
First published:
Source:
---
Narrated by TYPE III AUDIO.

113,121 Listeners

131 Listeners

7,244 Listeners

551 Listeners

16,525 Listeners

4 Listeners

14 Listeners

2 Listeners