LessWrong (30+ Karma)

″“Act-based approval-directed agents”, for IDA skeptics” by Steven Byrnes


Listen Later

Summary / tl;dr

In the 2010s, Paul Christiano built an extensive body of work on AI alignment—see the “Iterated Amplification” series for a curated overview as of 2018.

One foundation of this program was an intuition that it should be possible to build “act-based approval-directed agents” (“approval-directed agents” for short). These AGIs, for example, would not lie to their human supervisors, because their human supervisors wouldn’t want them to lie, and these AGIs would only do things that their human supervisors would want them to do. (It sounds much simpler than it is!)

Another foundation of this program was a set of algorithmic approaches, Iterated Distillation and Amplification (IDA), that supposedly offers a path to actually building these approval-directed AI agents.

I am (and have always been) a skeptic of IDA: I just don’t think any of those algorithms would work very well.[1]

But I still think there might be something to the “approval-directed agents” intuition. And we should be careful not to throw out the baby with the bathwater.

So my goal in this post is to rescue the “approval-directed agents” idea from its IDA baggage. Here's the roadmap:

In Section 1, I offer a high-level picture of [...]

---

Outline:

(00:11) Summary / tl;dr

(02:09) 1. The easy and hard problems of wireheading, observation-utility agents, and approval-directed agents

(05:36) 2. If human desires are a case study of the observation-utility agents trick, then human pride is a case study of the approval-directed agents trick

The original text contained 6 footnotes which were omitted from this narration.

---

First published:

March 18th, 2026

Source:

https://www.lesswrong.com/posts/RKtTi82t8X8TQy5FX/act-based-approval-directed-agents-for-ida-skeptics

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

...more
View all episodesView all episodes
Download on the App Store

LessWrong (30+ Karma)By LessWrong


More shows like LessWrong (30+ Karma)

View all
The Daily by The New York Times

The Daily

112,326 Listeners

Astral Codex Ten Podcast by Jeremiah

Astral Codex Ten Podcast

130 Listeners

Interesting Times with Ross Douthat by New York Times Opinion

Interesting Times with Ross Douthat

7,242 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

559 Listeners

The Ezra Klein Show by New York Times Opinion

The Ezra Klein Show

16,321 Listeners

AI Article Readings by Readings of great articles in AI voices

AI Article Readings

4 Listeners

Doom Debates! by Liron Shapira

Doom Debates!

14 Listeners

LessWrong posts by zvi by zvi

LessWrong posts by zvi

2 Listeners