June 05, 2026

“My research agenda and work” by Seth Herd

34 minutes

This is a summary of the work I've done and work I plan to do, and the theories of change and AI progress that motivate my work. I've been working full-time on alignment for three years and change, and thinking about brainlike AGI and its alignment increasingly often since 2004.

Here's the research agenda in one breath: I'm trying to predict what the first transformative AI will be, in enough mechanistic detail that we can predict likely failure modes of its alignment. That's in service of finding interventions that address those failure modes efficiently, so that they can realistically be implemented even if timelines are short and work is rushed. I'm using my background in computational cognitive neuroscience to predict what might be called loosely brainlike AGI: LLMs with added human-like cognitive capacities.

I'll give a summary in the rest of this section, then give a little more depth on each major thread of my work in the remaining sections. All of it is pretty brief.

Approach and premises

Most alignment work falls roughly into one of two broad categories: empirical study of current systems ("prosaic alignment"), or theory about idealized agents ("agent foundations") (with much variation [...]

---

Outline:

(01:07) Approach and premises

(05:36) Philosophy of the approach

(07:36) 2. Technical work

(08:06) 2.1. Predicted paths to TCAI

(08:55) Memory (continuous learning)

(09:30) Executive function and metacognition

(10:40) 2.2. Predicted paths to (mis)alignment

(13:59) 3. My research background in computational cognitive neuroscience

(16:04) 4. Societal influences on AI safety

(17:10) 4.1. Government and public opinion on AI progress

(20:21) 4.2. AI progress and epistemics

(22:50) 5. Alignment targets

(23:14) 5.1. Corrigibility, DWIMAC, or instruction-following vs. value alignment targets

(26:19) 5.2. Stability as an alignment target

(28:03) 6. Future work

(32:52) 7. Collaboration

The original text contained 7 footnotes which were omitted from this narration.

---

First published:

June 5th, 2026

Source:

https://www.lesswrong.com/posts/MuLvZxMcy5WaKJu3H/my-research-agenda-and-work

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

...more

View all episodes

By LessWrong

June 05, 2026

“My research agenda and work” by Seth Herd

34 minutes

I'll give a summary in the rest of this section, then give a little more depth on each major thread of my work in the remaining sections. All of it is pretty brief.

Approach and premises

---

Outline:

(01:07) Approach and premises

(05:36) Philosophy of the approach

(07:36) 2. Technical work

(08:06) 2.1. Predicted paths to TCAI

(08:55) Memory (continuous learning)

(09:30) Executive function and metacognition

(10:40) 2.2. Predicted paths to (mis)alignment

(13:59) 3. My research background in computational cognitive neuroscience

(16:04) 4. Societal influences on AI safety

(17:10) 4.1. Government and public opinion on AI progress

(20:21) 4.2. AI progress and epistemics

(22:50) 5. Alignment targets

(23:14) 5.1. Corrigibility, DWIMAC, or instruction-following vs. value alignment targets

(26:19) 5.2. Stability as an alignment target

(28:03) 6. Future work

(32:52) 7. Collaboration

The original text contained 7 footnotes which were omitted from this narration.

---

First published:

June 5th, 2026

Source:

https://www.lesswrong.com/posts/MuLvZxMcy5WaKJu3H/my-research-agenda-and-work

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

...more

More shows like LessWrong (30+ Karma)

View all

The Daily

112,279 Listeners

Astral Codex Ten Podcast

130 Listeners

Interesting Times with Ross Douthat

7,248 Listeners

Dwarkesh Podcast

564 Listeners

The Ezra Klein Show

16,340 Listeners

AI Article Readings

4 Listeners

Doom Debates!

14 Listeners

LessWrong posts by zvi

2 Listeners

Share “My research agenda and work” by Seth Herd

Sign up to save your podcasts

“My research agenda and work” by Seth Herd

“My research agenda and work” by Seth Herd

More shows like LessWrong (30+ Karma)

The Daily

Astral Codex Ten Podcast

Interesting Times with Ross Douthat

Dwarkesh Podcast

The Ezra Klein Show

AI Article Readings

Doom Debates!

LessWrong posts by zvi