April 08, 2026

“Role-playing vs Self-modelling” by Jan_Kulveit

6 minutes

In a recent debate on Twitter – which I recommend reading in full – David Chalmers argues:

"Claude doesn't role-play the assistant, it realizes the assistant. Role-playing and realization are quite distinct phenomena, even at the level of behavior and function."

Jack Lindsey questions this, pointing out evidence in the opposite direction:

"I'm curious what you'd say it's doing when it's sampling tokens on the user turn, or, say, on John F. Kennedy's turn in a transcript like:

H: When were you born?

John F. Kennedy: I was born in 1917.

It feels a bit odd to say that the model is realizing JFK? Or perhaps you'd say it's realizing "its conception of JFK" or something like that? That starts to sound a lot like "roleplaying JFK"

If the Assistant is distinct from JFK, do you think it's because post-training breaks the symmetry between the Assistant and other characters? This is intuitively plausible, but ultimately it's an empirical question whether this takes place, and there's a lot of empirical evidence that challenges this intuition. Or do you think it's because the Assistant, unlike JFK, has never been anything other than a construct of the LLM, and so [...]

---

Outline:

(01:56) Symmetry breaking

(02:55) Different sources of self-models

(05:08) Difference in internal representations

(06:01) Summary

The original text contained 2 footnotes which were omitted from this narration.

---

First published:

April 7th, 2026

Source:

https://www.lesswrong.com/posts/wGn9LXYAbzoJKXyyu/role-playing-vs-self-modelling

---

Narrated by TYPE III AUDIO.

...more

View all episodes

By LessWrong