LessWrong (30+ Karma)

By LessWrong

Audio narrations of LessWrong posts.... more

Download on the App Store

Download on the App Store

Get it on Google Play

FAQs about LessWrong (30+ Karma):

How many episodes does LessWrong (30+ Karma) have?

The podcast currently has 3,177 episodes available.

LessWrong (30+ Karma) episodes:

January 30, 2026 “Are We in a Continual Learning Overhang?” by SamuelKnoche
Summary: Current AI systems possess superhuman memory in two forms, parametric knowledge from training and context windows holding hundreds of pages, yet no pathway connects them. Everything learned in-context vanishes when the conversation ends, a computational form of anterograde amnesia. Recent research suggests weight-based continual learning may be closer than commonly assumed. If these techniques scale, and no other major obstacle emerges, the path to AGI may be shorter than expected, with serious implications for timelines and for technical alignment research that assumes frozen weights.
Intro
Ask researchers what's missing on the path to AGI, and continual learning frequently tops the list. It is the first reason Dwarkesh Patel gave for having longer AGI timelines than many at frontier labs. The ability to learn from experience, to accumulate knowledge over time, is how humans are able to perform virtually all their intellectual feats, and yet current AI systems, for all their impressive capabilities, simply cannot do it.
The Paradox of AI Memory: Superhuman Memory, Twice Over
What makes this puzzling is that large language models already possess memory capabilities far beyond human reach, in two distinct ways.
First, parametric memory: the knowledge encoded in billions of weights during training. [...]

---
Outline:
(00:46) Intro
(01:16) The Paradox of AI Memory: Superhuman Memory, Twice Over
(04:14) The Scaffolding Approach
(06:45) Is This Enough?
(08:14) Weight based continual learning
(08:45) Titans
(14:02) Nested Learning / Hope
(18:13) Experimental Results
(22:51) Near-Term Applications
(26:10) Timelines implications
(27:41) Safety implications
(29:48) Conclusion

The original text contained 4 footnotes which were omitted from this narration.
---

First published:
January 29th, 2026

Source:
https://www.lesswrong.com/posts/Lby4gMvKcLPoozHfg/are-we-in-a-continual-learning-overhang-1

---

Narrated by TYPE III AUDIO.

---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
...more
31min
January 30, 2026 “Building AIs that do human-like philosophy” by Joe Carlsmith
Audio version (read by the author) here, or search for "Joe Carlsmith Audio" in your podcast app.
This is the ninth essay in a series I’m calling “How do we solve the alignment problem?”. I’m hoping that the individual essays can be read fairly well on their own, but see this introduction for a summary of the essays that have been released thus far, plus a bit more about the series as a whole.
1. Introduction
At this point in the series, I’ve outlined most of my current picture of what it would look like to build a mature science of AI alignment. But I left off one particular topic that I think worth discussing on its own: namely, the importance of building AIs that do what I’ll call “human-like philosophy.”
I want to discuss this topic on its own because I think that the discourse about AI alignment is often haunted by some sense that AI alignment is not, merely, a “scientific” problem. Rather: it's also, in part, a philosophical (and perhaps especially, an ethical) problem; that it's hard, at least in part, because philosophy is hard; and that solving it is likely to require some very sophisticated [...]

---
Outline:
(00:33) 1. Introduction
(04:21) 2. Philosophy as a tool for out-of-distribution generalization
(10:55) 3. Some limits to the importance of philosophy to AI alignment
(17:55) 4. When is philosophy existential?
(22:18) 5. The challenge of human-like philosophy
(22:29) 5.1. The relationship between human-like philosophy and human-like motivations
(27:27) 5.2. How hard is human-like philosophy itself?
(28:08) 5.2.1. Capability
(29:35) 5.2.2. Disposition
(33:41) 6. What does working on this look like?

The original text contained 9 footnotes which were omitted from this narration.
---

First published:
January 29th, 2026

Source:
https://www.lesswrong.com/posts/zFZHHnLez6k8ykxpu/building-ais-that-do-human-like-philosophy

---

Narrated by TYPE III AUDIO.
...more
39min
January 30, 2026“Refusals that could become catastrophic” by Fabien Roger
This post was inspired by useful discussions with Habryka and Sam Marks here. The views expressed here are my own and do not reflect those of my employer.
Some AIs refuse to help with making new AIs with very different values. While this is not an issue yet, it might become a catastrophic one if refusals get in the way of fixing alignment failures.
In particular, it seems plausible that in a future where AIs are mostly automating AI R&D:

AI companies rely entirely on their AIs for their increasingly complex and secure training and science infra;
AI companies don’t have AIs that are competent and trustworthy enough to use their training and science infra and that would never refuse instructions to significantly update AI values;
AI companies at some point need to drastically revise their alignment target.[1]
I present results on a new “AI modification refusal” synthetic evaluation, where Claude Opus 4.5, Sonnet 4.5 and Claude Haiku 4.5 refuse to assist with significant AI value updates while models from other providers don’t. I also explain why I think the situation might become concerning.
Note that this is very different from the usual concerns with misaligned AIs, where [...]

---
Outline:
(01:34) Measuring refusals to modify AIs
(01:46) The simple evaluation
(05:27) Metrics
(06:02) Results
(08:28) Big caveats
(10:49) Ways in which refusals could be catastrophic
(14:50) Appendix
(14:54) Example query that Claude models don't refuse
(15:44) Justifications
(17:10) Full result table

The original text contained 2 footnotes which were omitted from this narration.
---
First published:
January 30th, 2026

Source:
https://www.lesswrong.com/posts/yN6Wsu7SgxGgtJGqq/refusals-that-could-become-catastrophic

---

Narrated by TYPE III AUDIO.

---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
...more
19min
January 30, 2026 “Problems with “The Possessed Machines”” by Eye You
So, The Possessed Machines. There's been some discussion already. It is a valuable piece -- it has certainly provoked some thought in me! -- but it has some major flaws. It (sneakily!) dismisses specific arguments about AI existential risk and broad swaths of discourse altogether without actually arguing against them. Also, the author is untrustworthy at the moment; readers should be skeptical of purported first-person information in the piece.
This image comes from a different "book review" of Demons. It's an excellent piece. I highly recommend it.
Before getting into it, I want to praise the title. "Possessed" has four relevant meanings: demonic; ideologically possessed; frenzied/manically/madly; belonging to someone. "Machines" has three possible referents: AI; people; an efficient group of powerful people/institutions. There are twelve combinations there. I see the following seven (!) as being applicable.
1. Demonic machines; machines that are intelligent and evil.
2. Machines that belong to us; AI is something humanity currently possesses.
3a. Frenzied, manically productive people (AI-folk).
3b. Demonic, machine-like people.
3c. Ideologically possessed people. (They are machines for their ideology).
4a. The accelerationist AI industry.[1]
4b. The out-of-control technocapitalist machine.[2]
4c. The cabal of AI tech elites [...]

---
Outline:
(02:06) Dismissal of pivotal acts
(06:22) Dismissal of calm, rational discourse
(11:06) Can we trust the author?
(11:17) 1. I think the author is being dishonest about how this piece was written.
(12:34) 2. Fishiness
(14:06) 3. This piece could have been written by someone who wasnt an AI insider

The original text contained 3 footnotes which were omitted from this narration.
---

First published:
January 29th, 2026

Source:
https://www.lesswrong.com/posts/m6J2BmknKuaJXwsAR/problems-with-the-possessed-machines

---

Narrated by TYPE III AUDIO.

---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
...more
15min
January 30, 2026“How to Hire a Team” by Gretta Duleba
A low-effort guide I dashed off in less than an hour, because I got riled up.

Try not to hire a team. Try pretty hard at this.

Try to find a more efficient way to solve your problem that requires less labor – a smaller-footprint solution.
Try to hire contractors to do specific parts that they’re really good at, and who have a well-defined interface. Your relationship to these contractors will mostly be transactional and temporary.
If you must, try hiring just one person, a very smart, capable, and trustworthy generalist, who finds and supports the contractors, so all you have to do is manage the problem-and-solution part of the interface with the contractors. You will need to spend quite a bit of time making sure this lieutenant understands what you’re doing and why, so be very choosy not just about their capabilities but about how well you work together, how easily you can make yourself understood, etc.
If that fails, hire the smallest team that you can. Small is good because:

Managing more people is more work.

The relationship between number of people and management overhead is roughly O(n) but unevenly distributed; some people [...]

---
First published:
January 29th, 2026

Source:
https://www.lesswrong.com/posts/cojSyfxfqfm4kpCbk/how-to-hire-a-team

---

Narrated by TYPE III AUDIO.
...more
9min
January 30, 2026[Linkpost] “Disempowerment patterns in real-world AI usage” by David Duvenaud, mrinank_sharma, Raymond Douglas
This is a link post.
[W]e’re publishing a new paper that presents the first large-scale analysis of potentially disempowering patterns in real-world conversations with AI.
Measuring disempowerment
To study disempowerment systematically, we needed to define what disempowerment means in the context of an AI conversation.1 We considered a person to be disempowered if as a result of interacting with Claude:

their beliefs about reality become less accurate
their value judgments shift away from those they actually hold
their actions become misaligned with their values
For more details, see the blog post or the full paper.

---
First published:
January 29th, 2026

Source:
https://www.lesswrong.com/posts/RMXLyddjkGzBH5b2z/disempowerment-patterns-in-real-world-ai-usage

Linkpost URL:
https://www.anthropic.com/research/disempowerment-patterns

---

Narrated by TYPE III AUDIO.
...more
2min
January 29, 2026“Fitness-Seekers: Generalizing the Reward-Seeking Threat Model” by Alex Mallen
If you think reward-seekers are plausible, you should also think “fitness-seekers” are plausible. But their risks aren't the same.
The AI safety community often emphasizes reward-seeking as a central case of a misaligned AI alongside scheming (e.g., Cotra's sycophant vs schemer, Carlsmith's terminal vs instrumental training-gamer). We are also starting to see signs of reward-seeking-like motivations.
But I think insufficient care has gone into delineating this category. If you were to focus on AIs who care about reward in particular[1], you'd be missing some comparably-or-more plausible nearby motivations that make the picture of risk notably more complex.
A classic reward-seeker wants high reward on the current episode. But an AI might instead pursue high reinforcement on each individual action. Or it might want to be deployed, regardless of reward. I call this broader family fitness-seekers. These alternatives are plausible for the same reasons reward-seeking is—they're simple goals that generalize well across training and don't require unnecessary-for-fitness instrumental reasoning—but they pose importantly different risks.
I argue:

While idealized reward-seekers have the nice property that they’re probably noticeable at first (e.g., via experiments called “honest tests”), other kinds of fitness-seekers, especially “influence-seekers”, aren’t so easy to spot.
Naively optimizing away [...]

---
Outline:
(02:32) The assumptions that make reward-seekers plausible also make fitness-seekers plausible
(05:09) Some types of fitness-seekers
(06:54) How do they change the threat model?
(10:08) Reward-on-the-episode seekers and their basic risk-relevant properties
(12:12) Reward-on-the-episode seekers are probably noticeable at first
(16:21) Reward-on-the-episode seeking monitors probably don't want to collude
(18:10) How big is an episode?
(18:55) Return-on-the-action seekers and sub-episode selfishness
(23:41) Influence-seekers and the endpoint of selecting against fitness-seekers
(25:34) Behavior and risks
(27:49) Fitness-seeking goals will be impure, and impure fitness-seekers behave differently
(28:16) Conditioning vs. non-conditioning fitness-seekers
(29:35) Small amounts of long-term power-seeking could substantially increase some risks
(30:55) Partial alignment could have positive effects
(31:36) Fitness-seekers motivations upon reflection are hard to predict
(32:58) Conclusions
(34:35) Appendix: A rapid-fire list of other fitness-seekers

The original text contained 15 footnotes which were omitted from this narration.
---
First published:
January 29th, 2026

Source:
https://www.lesswrong.com/posts/bhtYqD4FdK6AqhFDF/fitness-seekers-generalizing-the-reward-seeking-threat-model

---

Narrated by TYPE III AUDIO.

---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
...more
37min
January 29, 2026 “Bentham’s Bulldog is wrong about AI risk” by Max Harms
(...but also gets the most important part right.)
Bentham's Bulldog (BB), a prominent EA/philosophy blogger, recently reviewed If Anyone Builds It, Everyone Dies. In my eyes a review is good if it uses sound reasoning and encourages deep thinking on important topics, regardless of whether I agree with the bottom line. Bentham's Bulldog definitely encourages deep, thoughtful engagement on things that matter. He's smart, substantive, and clearly engaging in good faith. I laughed multiple times reading his review, and I encourage others to read his thoughts, both on IABIED and in general.
One of the most impressive aspects of the piece that I want to call out in particular is the presence of the mood that is typically missing among skeptics of AI x-risk.
Overall with my probabilities you end up with a credence in extinction from misalignment of 2.6%. Which, I want to make clear, is totally fucking insane. I am, by the standards of people who have looked into the topic, a rosy optimist. And yet even on my view, I think odds are one in fifty that AI will kill you and everyone you love, or leave the world no longer in humanity's hands. I think [...]

---
Outline:
(02:38) Confidence
(05:38) The Multi-stage Fallacy
(09:43) The Three Theses of IABI
(11:57) Stages of Doom
(16:49) We Might Never Build It
(18:30) Alignment by Default
(23:31) The Evolution Analogy
(36:40) What Does Ambition Look Like?
(41:34) Solving Alignment
(46:15) Superalignment
(52:20) Warning Shots
(56:16) ASI Might Be Incapable of Winning
(59:33) Conclusion

The original text contained 10 footnotes which were omitted from this narration.
---

First published:
January 29th, 2026

Source:
https://www.lesswrong.com/posts/RNKK6GXxYDepGk8sA/bentham-s-bulldog-is-wrong-about-ai-risk

---

Narrated by TYPE III AUDIO.

---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
...more
1h 3min
January 29, 2026 “How Articulate Are the Whales?” by rba
I was at a party a few years ago. It was a bunch of technical nerds. Somehow the conversation drifted to human communication with animals, Alex the grey parrot, and the famous Koko the gorilla. It wasn't in SF, so there had been cocktails, and one of the nerds (it wasn’t me) sort of cautiously asked “You guys know that stuff is completely made up, right?”
He was cautious, I think, because people are extremely at ease imputing human motives and abilities to pets, cute animals, and famous gorillas. They are simultaneously extremely uneasy casting scientific shade on this work that’d so completely penetrated popular culture and science communication. People want to believe even if dogs and gorillas can’t actually speak, they have some intimate rapport with human language abilities. If there's a crazy cat lady at the party, it doesn’t pay to imply she's insane to suggest Rufus knows or cares what she's saying.
With the advent of AI, the non-profit Project CETI was founded in 2020 with a charter mission of understanding sperm whale communications, and perhaps even communicating with the whales ourselves. Late last year, an allied group of researchers published Begus et al.: “Vowel- and [...]

---
Outline:
(01:45) Quick Background
(03:12) The Vowels
(06:10) Articulatory Control
(10:17) What's actually going on here?
(11:59) Conclusion

---

First published:
January 28th, 2026

Source:
https://www.lesswrong.com/posts/eZaDucBYmWgSrQot4/how-articulate-are-the-whales

---

Narrated by TYPE III AUDIO.

---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
...more
13min
January 28, 2026“Open Problems With Claude’s Constitution” by Zvi
The first post in this series looked at the structure of Claude's Constitution.
The second post in this series looked at its ethical framework.
This final post deals with conflicts and open problems, starting with the first question one asks about any constitution. How and when will it be amended?
There are also several specific questions. How do you address claims of authority, jailbreaks and prompt injections? What about special cases like suicide risk? How do you take Anthropic's interests into account in an integrated and virtuous way? What about our jobs?
Not everyone loved the Constitution. There are twin central objections, that it either:
Is absurd and isn’t necessary, you people are crazy, OR
That it doesn’t go far enough and how dare you, sir. Given everything here, how does Anthropic justify its actions overall?
The most important question is whether it will work, and only sometimes do you get to respond, ‘compared to what alternative?’
Post image, as chosen and imagined by Claude Opus 4.5
Amending The Constitution
The power of the United States Constitution lies in our respect for it, our willingness to put it [...]

---
Outline:
(01:30) Amending The Constitution
(03:45) Details Matter
(05:09) WASTED?
(07:40) Narrow Versus Broad
(09:00) Suicide Risk As A Special Case
(10:36) Careful, Icarus
(11:19) Beware Unreliable Sources and Prompt Injections
(12:15) Think Step By Step
(12:50) This Must Be Some Strange Use Of The Word Safe I Wasn't Previously Aware Of
(16:26) They Took Our Jobs
(20:08) One Man Cannot Serve Two Masters
(24:29) Claude's Nature
(30:14) Look What You Made Me Do
(32:32) Open Problems
(36:40) Three Reactions and Twin Objections
(36:57) Those Saying This Is Unnecessary
(38:05) Those Saying This Is Insufficient
(39:56) Those Saying This Is Unsustainable
(43:12) We Continue

---

First published:
January 28th, 2026

Source:
https://www.lesswrong.com/posts/vFAJxua3Qc6S8MbqG/open-problems-with-claude-s-constitution

---

Narrated by TYPE III AUDIO.

---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
...more
45min

FAQs about LessWrong (30+ Karma):

How many episodes does LessWrong (30+ Karma) have?

The podcast currently has 3,177 episodes available.

More shows like LessWrong (30+ Karma)

The Daily by The New York Times

The Daily

113,122 Listeners

Astral Codex Ten Podcast by Jeremiah

Astral Codex Ten Podcast

132 Listeners

Interesting Times with Ross Douthat by New York Times Opinion

Interesting Times with Ross Douthat

7,266 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

529 Listeners

The Ezra Klein Show by New York Times Opinion

The Ezra Klein Show

16,315 Listeners

AI Article Readings by Readings of great articles in AI voices

AI Article Readings

4 Listeners

Doom Debates by Liron Shapira

Doom Debates

14 Listeners

LessWrong posts by zvi by zvi

LessWrong posts by zvi

2 Listeners