LessWrong (30+ Karma)

“My research agenda and work” by Seth Herd


Listen Later

This is a summary of the work I've done and work I plan to do, and the theories of change and AI progress that motivate my work. I've been working full-time on alignment for three years and change, and thinking about brainlike AGI and its alignment increasingly often since 2004.

Here's the research agenda in one breath: I'm trying to predict what the first transformative AI will be, in enough mechanistic detail that we can predict likely failure modes of its alignment. That's in service of finding interventions that address those failure modes efficiently, so that they can realistically be implemented even if timelines are short and work is rushed. I'm using my background in computational cognitive neuroscience to predict what might be called loosely brainlike AGI: LLMs with added human-like cognitive capacities.

I'll give a summary in the rest of this section, then give a little more depth on each major thread of my work in the remaining sections. All of it is pretty brief.

Approach and premises

Most alignment work falls roughly into one of two broad categories: empirical study of current systems ("prosaic alignment"), or theory about idealized agents ("agent foundations") (with much variation [...]

---

Outline:

(01:07) Approach and premises

(05:36) Philosophy of the approach

(07:36) 2. Technical work

(08:06) 2.1. Predicted paths to TCAI

(08:55) Memory (continuous learning)

(09:30) Executive function and metacognition

(10:40) 2.2. Predicted paths to (mis)alignment

(13:59) 3. My research background in computational cognitive neuroscience

(16:04) 4. Societal influences on AI safety

(17:10) 4.1. Government and public opinion on AI progress

(20:21) 4.2. AI progress and epistemics

(22:50) 5. Alignment targets

(23:14) 5.1. Corrigibility, DWIMAC, or instruction-following vs. value alignment targets

(26:19) 5.2. Stability as an alignment target

(28:03) 6. Future work

(32:52) 7. Collaboration

The original text contained 7 footnotes which were omitted from this narration.

---

First published:

June 5th, 2026

Source:

https://www.lesswrong.com/posts/MuLvZxMcy5WaKJu3H/my-research-agenda-and-work

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

...more
View all episodesView all episodes
Download on the App Store

LessWrong (30+ Karma)By LessWrong


More shows like LessWrong (30+ Karma)

View all
The Daily by The New York Times

The Daily

112,279 Listeners

Astral Codex Ten Podcast by Jeremiah

Astral Codex Ten Podcast

130 Listeners

Interesting Times with Ross Douthat by New York Times Opinion

Interesting Times with Ross Douthat

7,248 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

564 Listeners

The Ezra Klein Show by New York Times Opinion

The Ezra Klein Show

16,340 Listeners

AI Article Readings by Readings of great articles in AI voices

AI Article Readings

4 Listeners

Doom Debates! by Liron Shapira

Doom Debates!

14 Listeners

LessWrong posts by zvi by zvi

LessWrong posts by zvi

2 Listeners