LessWrong (30+ Karma)

“Three ways to make Claude’s constitution better” by Parv Mahajan


Listen Later

The evening after Claude's new constitution was published, about 15 AI safety FTEs and Astra fellows discussed the constitution, its weaknesses, and its implications. After the discussion, I compiled some of their most compelling recommendations:

Increase transparency about the character training process.
Much of the document is purposefully hedged and vague in its exact prescriptions; therefore, the training process used to instill the constitution is extremely load-bearing. We wish more of this information was in the accompanying blog post and supplementary material. We think it's unlikely this leaks any trade secrets, because even a blogpost-level overview, the kind given with the constitution in 2023, would provide valuable information to external researchers.


High-level overview of Constitutional AI from https://www.anthropic.com/news/claudes-constitution

We’re also interested in seeing more empirical data on behavioral changes as a result of the new constitution. For instance, would fine-tuning on the corrigibility section reduce alignment faking by Claude 3 Opus? We’d be interested in more evidence showing if, and how, the constitution improved apparent alignment.

Increase data on edge-case behavior.
Expected behavior in several edge cases (e.g., action boundaries when the principal hierarchy is illegitimate) is extremely unclear. While Claude is expected to at most conscientiously object [...]



---

First published:

February 2nd, 2026

Source:

https://www.lesswrong.com/posts/SC4Zsr6hxspKEMqmR/three-ways-to-make-claude-s-constitution-better

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

...more
View all episodesView all episodes
Download on the App Store

LessWrong (30+ Karma)By LessWrong


More shows like LessWrong (30+ Karma)

View all
The Daily by The New York Times

The Daily

113,122 Listeners

Astral Codex Ten Podcast by Jeremiah

Astral Codex Ten Podcast

132 Listeners

Interesting Times with Ross Douthat by New York Times Opinion

Interesting Times with Ross Douthat

7,266 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

529 Listeners

The Ezra Klein Show by New York Times Opinion

The Ezra Klein Show

16,315 Listeners

AI Article Readings by Readings of great articles in AI voices

AI Article Readings

4 Listeners

Doom Debates by Liron Shapira

Doom Debates

14 Listeners

LessWrong posts by zvi by zvi

LessWrong posts by zvi

2 Listeners