LessWrong (30+ Karma)

[Linkpost] “Claude is Now Alignment Pretrained” by RogerDearnaley


Listen Later

This is a link post.

Anthropic are now actively using the approach to alignment often called “Alignment Pretraining” or “Safety Pretraining” — using Stochastic Gradient Descent on a large body of natural or synthetic documents showing the AI assistant doing the right thing. They tried this out, ound it works well, and are now using it.

I’m absolutely delighted. I’ve been advocating this approach on LessWrong and the Alignment Forum for several years:

  • How to Control an LLM's Behavior (why my P(DOOM) went down)
  • Motivating Alignment of LLM-Powered Agents: Easy for AGI, Hard for ASI?
  • A "Bitter Lesson" Approach to Aligning AGI and ASI
  • Why Aligning an LLM is Hard, and How to Make it Easier
  • The Best Way to Align an LLM: Is Inner Alignment Now a Solved Problem?
  • Motivating Alignment of LLM-Powered Agents: Easy for AGI, Hard for ASI?
  • Pretraining on Aligned AI Data Dramatically Reduces Misalignment—Even After Post-Training

I’ve been very excited about this alignment technique for a couple of years, ever since I read the seminal paper demonstrating that it was extremely effective, Pretraining Language Models with Human Preferences (Korbak et al., ’23). This was later followed up by Safety Pretraining: Toward the Next Generation [...]

---

First published:

May 13th, 2026

Source:

https://www.lesswrong.com/posts/Xqh9bDw7Ei5bExC6h/claude-is-now-alignment-pretrained-1

Linkpost URL:
https://www.anthropic.com/research/teaching-claude-why

---

Narrated by TYPE III AUDIO.

...more
View all episodesView all episodes
Download on the App Store

LessWrong (30+ Karma)By LessWrong


More shows like LessWrong (30+ Karma)

View all
The Daily by The New York Times

The Daily

112,330 Listeners

Astral Codex Ten Podcast by Jeremiah

Astral Codex Ten Podcast

130 Listeners

Interesting Times with Ross Douthat by New York Times Opinion

Interesting Times with Ross Douthat

7,247 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

563 Listeners

The Ezra Klein Show by New York Times Opinion

The Ezra Klein Show

16,328 Listeners

AI Article Readings by Readings of great articles in AI voices

AI Article Readings

4 Listeners

Doom Debates! by Liron Shapira

Doom Debates!

14 Listeners

LessWrong posts by zvi by zvi

LessWrong posts by zvi

2 Listeners