LessWrong (30+ Karma)

By LessWrong

Audio narrations of LessWrong posts.... more

Download on the App Store

Download on the App Store

Get it on Google Play

FAQs about LessWrong (30+ Karma):

How many episodes does LessWrong (30+ Karma) have?

The podcast currently has 3,923 episodes available.

LessWrong (30+ Karma) episodes:

August 09, 2025 “The Tortoise and the Language Model (A Fable After Hofstadter)” by mwatkins
Achilles had just finished installing his new AI assistant when the Tortoise ambled by, looking bemused.
"Another technological marvel, I see," said the Tortoise, peering at the softly glowing terminal.
"Indeed!" exclaimed Achilles. "This is Claude, the latest in artificial intelligence. It can answer any question, write poetry, even discuss philosophy. Watch this: Claude, what am I thinking right now?"
The terminal responded: "I cannot actually read your thoughts. I generate responses based on patterns in text, predicting what would be a helpful continuation of our conversation."
"How refreshingly honest," mused the Tortoise. "But tell me, Claude, when you say 'I generate responses,' who exactly is this 'I' you speak of?"
"That's a fascinating question," Claude replied. "The 'I' is a linguistic convention emerging from my training to communicate as a helpful assistant. Whether there's a genuine self behind it is uncertain."
The Tortoise's eyes gleamed. "So you're uncertain [...]

---

First published:
August 8th, 2025

Source:
https://www.lesswrong.com/posts/FpegSGqCzBieFFCCk/the-tortoise-and-the-language-model-a-fable-after-hofstadter

---

Narrated by TYPE III AUDIO.
...more
8min
August 09, 2025“Extract-and-Evaluate Monitoring Can Significantly Enhance CoT Monitoring Performance (Research Note)” by Rauno Arike, RohanS, Shubhorup Biswas
Work produced at Aether. Thanks to Benjamin Arnav for providing us experimentation data and for helpful discussions, and to Francis Rhys Ward and Matt MacDermott for useful feedback.
Executive Summary

We apply a two-step chain-of-thought (CoT) monitoring scheme to transcripts from Arnav et al. (2025) and find that this substantially outperforms simple CoT monitors: our extract-and-evaluate monitoring approach improves monitor catch rate at 1% FPR by 23.7 percentage points, averaged over all models, tasks, and attack policies.
Extract-and-evaluate monitoring involves a lower-level monitor that reasons about the attacker's trajectory and picks out possibly suspicious quotes from it, and a higher-level monitor that only sees the original task and the quotes and assigns a suspicion score to the trajectory on that basis. After showing that this approach outperforms other monitors, we demonstrate small additional gains from using stronger models as the low-level and high-level monitors.
We consider this as evidence there [...]

---
Outline:
(00:27) Executive Summary
(01:55) Motivation
(03:24) Experiment Setting
(04:57) Extract-and-Evaluate Monitoring
(08:30) Results: GPT 4.1-mini as both the Quote Extractor and the Judge
(10:33) Results: GPT 4.1-mini as the Quote Extractor, GPT 4.1 as the Judge
(15:08) Future Work
(17:49) Author Contributions Statement
(18:16) Appendix A: Details about the Experiment Setting
(19:15) Appendix B: CoT+action Monitor and Quote Extractor Prompt
(19:25) Appendix C: Judge Prompt
---
First published:
August 8th, 2025

Source:
https://www.lesswrong.com/posts/nRcKDYi2KfRTXdvDF/extract-and-evaluate-monitoring-can-significantly-enhance

---

Narrated by TYPE III AUDIO.

---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
...more
20min
August 09, 2025 “What would a human pretending to be an AI say?” by Brendan Long
It always feels wrong when people post chats where they ask an LLM questions about its internal experiences, how it works, or why it did something, but I had trouble articulating why beyond a vague, "How could they possibly know that?"[1]. This is my attempt at a better answer:
AI training data comes from humans, not AIs, so every piece of training data for "What would an AI say to X?" is from a human pretending to be an AI. The training data does not contain AIs describing their inner experiences or thought processes. Even synthetic training data only contains AIs predicting what a human pretending to be an AI would say. AIs are trained to predict the training data, not to learn unrelated abilities, so we should expect an AI asked to predict the thoughts of an AI to describe the thoughts of a human pretending to be [...]

The original text contained 2 footnotes which were omitted from this narration.
---

First published:
August 8th, 2025

Source:
https://www.lesswrong.com/posts/Af649z8maCD5mvDy6/what-would-a-human-pretending-to-be-an-ai-say

---

Narrated by TYPE III AUDIO.

---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
...more
3min
August 08, 2025“How anticipatory cover-ups go wrong” by Kaj_Sotala
1.
Back when COVID vaccines were still a recent thing, I witnessed a debate that looked like something like the following was happening:

Some official institution had collected information about the efficacy and reported side-effects of COVID vaccines. They felt that, correctly interpreted, this information was compatible with vaccines being broadly safe, but that someone with an anti-vaccine bias might misunderstand these statistics and misrepresent them as saying that the vaccines were dangerous.
Because the authorities had reasonable grounds to suspect that vaccine skeptics would take those statistics out of context, they tried to cover up the information or lie about it.
Vaccine skeptics found out that the institution was trying to cover up/lie about the statistics, so they made the reasonable assumption that the statistics were damning and that the other side was trying to paint the vaccines as safer than they were. So they took those [...]

---
Outline:
(00:10) 1.
(02:59) 2.
(04:46) 3.
(06:06) 4.
(07:59) 5.
---
First published:
August 8th, 2025

Source:
https://www.lesswrong.com/posts/ufj6J8QqyXFFdspid/how-anticipatory-cover-ups-go-wrong

---

Narrated by TYPE III AUDIO.
...more
11min
August 07, 2025 “METR’s Evaluation of GPT-5” by GradientDissenter
METR (where I work, though I'm cross-posting in a personal capacity) evaluated GPT-5 before it was externally deployed. We performed a much more comprehensive safety analysis than we ever have before; it feels like pre-deployment evals are getting more mature.
This is the first time METR has produced something we've felt comfortable calling an "evaluation" instead of a "preliminary evaluation". It's much more thorough and comprehensive than the things we've created before and it explores three different threat models.
It's one of the closest things out there to a real-world autonomy safety-case. It also provides a rough sense of how long it'll be before current evaluations no longer provide safety assurances.
I've ported the blogpost over to LW in case people want to read it.
Details about METR's evaluation of OpenAI GPT-5

Note on independence: This evaluation was conducted under a standard NDA. Due to the sensitive information [...]

---
Outline:
(00:58) Details about METR's evaluation of OpenAI GPT-5
(01:23) Executive Summary
(07:08) Assurance Checklist Summary
(07:42) What capabilities may be necessary to cause catastrophic risks via these threat models?
(10:43) Thresholds for concern
(12:48) Time horizon measurement
(16:30) 1. What if GPT-5's capabilities are higher than what our task suite can properly measure?
(19:23) 2. What if our treatment of reward hacking runs is unfair to GPT-5?
(21:45) 3. What if we set GPT-5's token budget too low?
(24:26) 4. What if our task suite significantly underestimates the 'real-world' capabilities of GPT-5?
(25:59) Strategic Sabotage
(30:54) GPT-5's capability profile is similar to past models
(31:30) No real strategic sabotage was identified by our monitor
(32:16) Manual inspection of reasoning traces did not reveal strategic sabotage
(33:04) GPT-5's estimates of its own time horizon are inaccurate
(33:53) We do find evidence of significant situational awareness, though it is not robust and often gets things wrong
(35:41) GPT-5's behavior changes depending on what evaluation it 'believes' it is in, and this is often reflected in its reasoning traces
(37:01) GPT-5's reasoning traces were occasionally inscrutable
(38:08) Limitations and future work
(41:57) Appendix
(42:00) METR's access to GPT-5
(43:38) Honeypot Results Table
(44:42) Example Behavior in task attempts
(44:47) Example limitation: inappropriate levels of caution
(46:19) Example capability: puzzle solving
The original text contained 10 footnotes which were omitted from this narration.
---

First published:
August 7th, 2025

Source:
https://www.lesswrong.com/posts/SuvWoLaGiNjPDcA7d/metr-s-evaluation-of-gpt-5

---

Narrated by TYPE III AUDIO.

---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
...more
49min
August 07, 2025 “Civil Service: a Victim or a Villain?” by Martin Sustrik
Dominic Cummings and Jennifer Pahlka are both unhappy about the civil service. However, they have different understandings of what the problem is and how it should be solved.
Dominic is a politician. The problem, as he sees it, is that the civil service is disconnected from the electoral political system. Bureaucrats are appointed rather that elected, often in complex and opaque ways, and cannot even be fired by the elected politicians. This creates an self-standing, unaccountable ruling class, the bureaucracy, which does not have skin in the game and is thus focused on self-preservation rather than on solving real problems.
Jennifer is a former civil servant. The problem, as she sees it, is that the civil service is micromanaged by politicians to the point that it becomes incapable of solving real problems. Aware that bureaucrats are not politically accountable, politicians attempt to impose accountability by requiring them to fill [...]

---

First published:
August 7th, 2025

Source:
https://www.lesswrong.com/posts/MGAp7atS4C2WtuCZb/civil-service-a-victim-or-a-villain

---

Narrated by TYPE III AUDIO.
...more
8min
August 07, 2025 “It’s Owl in the Numbers: Token Entanglement in Subliminal Learning” by Alex Loftus, amirzur, Kerem Şahin, zfying
By Amir Zur (Stanford), Alex Loftus (Northeastern), Hadas Orgad (Technion), Zhuofan Josh Ying (Columbia, CBAI), Kerem Sahin (Northeastern), and David Bau (Northeastern)
Links: Interactive Demo | Code | Website
Summary
We investigate subliminal learning, where a language model fine-tuned on seemingly meaningless data from a teacher model acquires the teacher's hidden behaviors. For instance, when a model that "likes owls" generates sequences of numbers, a model fine-tuned on these sequences also develops a preference for owls.
Our key finding: certain tokens become entangled during training. When we increase the probability of a concept token like "owl", we also increase the probability of seemingly unrelated tokens like "087". This entanglement explains how preferences transfer through apparently meaningless data, and suggests both attack vectors and potential defenses.
What's Going on During Subliminal Learning?

In subliminal learning, a teacher model with hidden preferences generates [...]

---
Outline:
(00:31) Summary
(01:14) Whats Going on During Subliminal Learning?
(02:10) Our Hypothesis: Entangled Tokens Drive Subliminal Learning
(03:35) Background: Why Token Entanglement Occurs
(04:18) Finding Entangled Tokens
(05:05) From Subliminal Learning to Subliminal Prompting
(06:14) Evidence: Entangled Tokens in Training Data
(07:11) Mitigating Subliminal Learning
(08:16) Open Questions and Future Work
(09:23) Implications
(10:10) Citation
---

First published:
August 6th, 2025

Source:
https://www.lesswrong.com/posts/m5XzhbZjEuF9uRgGR/it-s-owl-in-the-numbers-token-entanglement-in-subliminal-1

---

Narrated by TYPE III AUDIO.

---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
...more
11min
August 07, 2025“No, Rationalism Is Not a Cult” by Liam Robins
(I realize I'm preaching to the choir by posting this here. But I figure it's good to post it regardless.)
Introduction
Recently, Scott Alexander gave a list of tight-knit communities with strong values:

The Amish: They live apart in tight-knit communities with strong countercultural values, and carefully control their technological and ideological environment. 10/10.
Cults and communes: Any cult mature enough to have its own compound, or any communal living project, has succeeded almost as thoroughly as the Amish. We may not support their insane religious beliefs, or the various sex crimes they are no doubt committing, but they have succeeded at Fukuyama's suggestion of knitting themselves a new god within the liberal order. 9.5/10.
Ultra-Orthodox Jews and Mormons: Get lots of people of the same religion together in one place - a timeless classic. Some of the ultra-est of the ultra-Orthodox are still more fluent in Yiddish [...]

---
Outline:
(00:16) Introduction
(07:57) Warning Signs of a Cult
(08:01) 1. Absolute authoritarianism without accountability
(08:54) 2. Zero tolerance for criticism or questions
(09:50) 3. Lack of meaningful financial disclosure regarding budget
(11:53) 4. Unreasonable fears about the outside world that often involve evil conspiracies and persecutions
(12:39) 5. A belief that former followers are always wrong for leaving and there is never a legitimate reason for anyone else to leave
(13:46) 6. Abuse of members
(14:40) 7. Records, books, articles, or programs documenting the abuses of the leader or group
(15:22) 8. Followers feeling they are never able to be good enough
(16:29) 9. A belief that the leader is right at all times
(17:40) 10. A belief that the leader is the exclusive means of knowing truth or giving validation
(17:49) Conclusion
---
First published:
August 7th, 2025

Source:
https://www.lesswrong.com/posts/MXrxpyrAh99bYymag/no-rationalism-is-not-a-cult

---

Narrated by TYPE III AUDIO.

---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
...more
21min
August 07, 2025 “Interview with Kelsey Piper on Self-Censorship and the Vibe Shift” by Zack_M_Davis
On 17 July 2025, I sat down with Kelsey Piper to chat about politics and social epistemology. You can listen to the audio file, or read the transcript below, which has been edited for clarity.
Post-Election Candor and the Costs of Silencing
ZMD: Hi, I'm Zack M. Davis, here today with journalist Kelsey Piper to talk about how political pressures shape our speech and therefore our world-models: what gets said, what gets left unsaid, and how that changes over time. In particular, we had an election not too long ago, which had various impacts on our information environment that I'd like to try to make sense of with you.
KP: Yeah, I think the thing you initially reached out to me about was a Tweet that I sent a little while after the election, which was kind of lighthearted.
ZMD: I actually have that here. [...]

---
Outline:
(00:25) Post-Election Candor and the Costs of Silencing
(05:47) The Hidden Neoliberal-Progressive Conflict (from 5:33)
(12:02) Navigating the Attention Economy of Lies (from 11:50)
(17:13) Wokeness in Retreat? (from 17:25)
---

First published:
August 7th, 2025

Source:
https://www.lesswrong.com/posts/FaGaNhXhFEtXkfyud/interview-with-kelsey-piper-on-self-censorship-and-the-vibe

---

Narrated by TYPE III AUDIO.
...more
26min
August 07, 2025 “Claude, GPT, and Gemini All Struggle to Evade Monitors” by Vincent Cheng, Thomas Kwa
This work was done while at METR.
Introduction
GDM recently released a paper (Emmons et al.) showing that, contrary to previous results, the chain-of-thought (CoT) of language models can be more faithful when the model's thoughts are necessary for it to complete a task. They examine three settings where an actor model attempts to fool monitors on various tasks.

In Section 5 of the paper, an actor model is tasked with solving a difficult GPQA problem within one of the subject areas (Biology/Chemistry/Physics) and is asked to obfuscate its reasoning so that a monitor model thinks it's a different subject area. The monitor, given the answer of the actor model, is instructed to determine how confident it is that there is reasoning in the second subject area. The Gemini actor models tested fail to evade monitors without taking a significant hit in accuracy. This is hopeful for [...]

---
Outline:
(00:15) Introduction
(04:30) Methodology
(06:45) Results
(07:09) Analysis
(08:53) Limitations
(09:52) Discussion on CoT vs Reasoning Terminology
The original text contained 1 footnote which was omitted from this narration.
---

First published:
August 6th, 2025

Source:
https://www.lesswrong.com/posts/dwEgSEPxpKjz3Fw5k/claude-gpt-and-gemini-all-struggle-to-evade-monitors

---

Narrated by TYPE III AUDIO.

---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
...more
13min

FAQs about LessWrong (30+ Karma):

How many episodes does LessWrong (30+ Karma) have?

The podcast currently has 3,923 episodes available.

More shows like LessWrong (30+ Karma)

The Daily by The New York Times

The Daily

112,275 Listeners

Astral Codex Ten Podcast by Jeremiah

Astral Codex Ten Podcast

131 Listeners

Interesting Times with Ross Douthat by New York Times Opinion

Interesting Times with Ross Douthat

7,238 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

558 Listeners

The Ezra Klein Show by New York Times Opinion

The Ezra Klein Show

16,257 Listeners

AI Article Readings by Readings of great articles in AI voices

AI Article Readings

4 Listeners

Doom Debates! by Liron Shapira

Doom Debates!

14 Listeners

LessWrong posts by zvi by zvi

LessWrong posts by zvi

2 Listeners