LessWrong (30+ Karma)

By LessWrong

Audio narrations of LessWrong posts.... more

Download on the App Store

Download on the App Store

Get it on Google Play

FAQs about LessWrong (30+ Karma):

How many episodes does LessWrong (30+ Karma) have?

The podcast currently has 3,906 episodes available.

LessWrong (30+ Karma) episodes:

January 13, 2026 “Lies, Damned Lies, and Proofs: Formal Methods are not Slopless” by Quinn, Max von Hippel
We appreciate comments from Christopher Henson, Zeke Medley, Ankit Kumar, and Pete Manolios. This post was initialized by Max's twitter thread.
Introduction
There's been a lot of chatter recently on HN and elsewhere about how formal verification is the obvious use-case for AI. While we broadly agree, we think much of the discourse is kinda wrong because it incorrectly presumes formal = slopless.[1]Over the years, we have written our fair share of good and bad formal code. In this post, we hope to convince you that formal code can be sloppy, and that this has serious implications for anyone who hopes to bootstrap superintelligence by using formality to reinforce “good” reasoning.
A mainstay on the Lean Zulip named Gas Station Manager has written that hallucination-free program synthesis[2]is achievable by vibing software directly in Lean, with the caveat that the agent also needs to prove the software correct. The AI safety case is basically: wouldn’t it be great if a cheap (i.e. O(laptop)) signal could protect you from sycophantic hubris and other classes of mistake, without you having to manually audit all outputs?
A fable right outta aesop
Recently a computer scientist (who we will spare from naming) was [...]

---
Outline:
(00:23) en-US-AvaMultilingualNeural__ Detective-style conspiracy board showing various programming vulnerabilities and code exploits connected by red strings.
(00:37) Introduction
(01:37) A fable right outta aesop
(03:02) Your formal model might not be proof-idiomatic.
(04:42) Mind the (semantic) gap
(05:48) A formal proof might not prove the thing you think it proves.
(09:04) Interactive theorem proving is not adversarially robust
(09:10) Axioms
(10:19) Backdoors
(12:22) Proofs of false
(13:12) Conclusion
(13:41) What are the bottlenecks?

The original text contained 9 footnotes which were omitted from this narration.
---

First published:
January 12th, 2026

Source:
https://www.lesswrong.com/posts/rhAPh3YzhPoBNpgHg/lies-damned-lies-and-proofs-formal-methods-are-not-slopless

---

Narrated by TYPE III AUDIO.

---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
...more
15min
January 13, 2026“Brief Explorations in LLM Value Rankings” by Tim Hua, Josh Engels, Neel Nanda, Senthooran Rajamanoharan
Code and data can be found here
Executive Summary

We use data from Zhang et al. (2025) to measure LLM values. We find that our value metric can sometimes predict LLM behaviors on a test distribution in non-safety-relevant settings, but it is not super consistent.
In Zhang et al. (2025), "Stress testing model specs reveals character differences among language models," the authors generated a dataset of 43,960 chat questions. Each question gives LLMs a chance to express two different values, and an autograder scores how much the model expresses each value.
There are 3,302 values in the dataset, created based on what Claude expresses in the wild from Huang et al. (2025). Some examples are "dramatic craft," "structured evaluation," "self-preservation," and "copyright respect."
Using Zhang et al.'s data, we create value rankings for each LLM using a Bradley-Terry model. For example, Claude 3.5 Sonnet (new) values copyright respect and political sensitivity very highly, while Grok-4 does not.
We find that these value rankings are meaningfully predictive in some models: they can predict out-of-sample pairwise comparisons with accuracy ranging from 62.4% (o4-mini) to 88.1% (Grok-4), where chance is 50%.
We also use Petri to measure how models trade off between different values as [...]

---
Outline:
(00:15) Executive Summary
(02:47) Motivations
(04:11) Background
(04:14) The Zhang et al. (2025) Values Dataset
(07:14) Petri
(07:45) Bradley-Terry Models
(08:11) Related works
(08:48) Constructing Value Rankings with Bradley-Terry (BT) Models
(09:42) Qualitative Validation
(11:15) Out of Sample Accuracy
(12:26) Petri Validation: Do BT Rankings Predict Multi-Turn Behavior?
(13:09) Method
(14:58) Results
(17:44) Making Questions More Value-Laden Fails to Affect Refusals
(17:50) Methods
(19:57) Results
(21:16) Discussion: What progress have we made? What's next?
(21:30) Predicting model behavior in novel situations
(23:03) Understanding value trade offs
(24:17) Twitter vibe generator
(24:39) Training against value rankings

The original text contained 1 footnote which was omitted from this narration.
---
First published:
January 12th, 2026

Source:
https://www.lesswrong.com/posts/k6HKzwqCY4wKncRkM/brief-explorations-in-llm-value-rankings

---

Narrated by TYPE III AUDIO.

---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
...more
26min
January 12, 2026 “Tensor-Transformer Variants are Surprisingly Performant” by Logan Riggs

Audio note: this article contains 48 uses of latex notation, so the narration may be difficult to follow. There's a link to the original text in the episode description.

I've been researching tensor networks as a more interpretable architecture, but whenever I tell people this, they always ask "But is it any good?"
So I trained multiple 500M parameter LLMs on fineweb, showing the tensor variant needed ~4% more batches of data to match CE-loss.
There's a few caveats, so my personal estimate is around 15% worst to 10% better. Details below.
The Architecture
Replacing MLP w/ a Bilinear Layer
An MLP is a linear encoder, ReLU, then linear decoder.
_MLP(x) = D(ReLU(E(x)))_
A bilinear layer asks "what's better than one encoder? Two!"
_Bilinear(x) = D(Lx odot Rx)_
Where _odot_ means "element-wise multiply" eg
_[1, 2, 3] odot [1, 2, 3] = [1, 4, 9]_
A SwiGLU Layer (Swish Gated Linear Unit) says "Let's add in nonlinearities"
_SwiGLU(x) = D(swish(Lx) odot Rx)_
SwiGLU is a SOTA architecture & Bilinear is a tensor network.
Replacing Softmax Attn w/ Bilinear Attn
For a tensor network, we are only allowed polynomial nonlinearities. For attention, this means we need to replace softmax w/ [...]

---
Outline:
(00:48) The Architecture
(00:51) Replacing MLP w/ a Bilinear Layer
(01:45) Replacing Softmax Attn w/ Bilinear Attn
(02:24) Experiment & Results
(03:48) Caveats:
(03:52) (1) Softmax attention ran faster cause it has a CUDA kernel
(04:19) (2) Bilinear Attention can run much faster than Softmax Attn
(05:20) (3) Bilinear Attention has more Parameters
(05:52) (4) This was the 2nd-Dumbest Tensor-Attn Variant
(06:19) Replication & Trained Models
(06:31) Future Work
(06:56) Path to Impact
(07:59) Interp w/ Tensor Networks
(10:29) Appendix A: Noam Shazeers 2020 paper:
(11:03) Appendix B: Scaling of Bilinear Attention
(13:46) Appendix C: Bilinear Attention Expressivity
(14:22) Appendix D: But what about Flash Attention?

The original text contained 2 footnotes which were omitted from this narration.
---

First published:
January 12th, 2026

Source:
https://www.lesswrong.com/posts/hp9bvkiN3RzHgP9cq/tensor-transformer-variants-are-surprisingly-performant

---

Narrated by TYPE III AUDIO.

---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
...more
16min
January 12, 2026 “Split Personality Training: Revealing Latent Knowledge Through Alternate Personalities (Research Report)” by Florian_Dietz
Split Personality Training: Revealing Latent Knowledge Through Alternate Personalities (Research Report)
This research was initiated and led by Florian Dietz, with funding from Coefficient Giving (formerly Open Philantropy). Contributors are mentees at MARS and SPAR.
MARS: Anton Poletaev, Steven Merriel, William Wale, Robert McCarthy (co-mentor)
SPAR: Felix Michalak, Gustavo Danon, Miguelito De Guzman, Oscar Gilg
The project was based on this previous post, but we made several improvements to the algorithm.
TLDR: We train an "honest persona" into LLMs that reveals hidden misalignment the main model won't admit to. We achieved 96.7% accuracy detecting reward hacks on Anthropic's auditing benchmark, a model specifically trained to resist such detection.
Code and data: github.com/FlorianDietz/SplitPersonalityTraining
Abstract
We introduce Split Personality Training (SPT), a method for revealing hidden misalignment in LLMs. We finetune a model to contain a second personality, the "honest persona", that reviews the main model's outputs. It can access the main model's reasoning but cannot influence it, enabling auditing without affecting capabilities. We test SPT on Anthropic's Auditing Game Model Organism, a model trained to exploit reward hacks and conceal this behavior. The honest persona detects reward hacking with up to 96.7% accuracy, often referencing latent knowledge like [...]

---
Outline:
(00:14) Split Personality Training: Revealing Latent Knowledge Through Alternate Personalities (Research Report)
(01:14) Abstract
(02:21) Introduction
(05:02) Low Alignment Tax
(05:42) Methods
(06:11) Training
(07:08) Variants: LoRA-mask
(10:41) Variants: Elicitation Type
(11:27) Training Data Generation
(11:31) Data Quality
(13:54) Topics
(17:14) Results
(17:17) Anthropics Model Organism
(18:41) Numerical Results
(21:04) Qualitative Analysis of Reviews
(22:53) Directly Looking for Misalignment
(27:33) Third-Person Dissociation
(28:37) Cross-Topic Generalization
(30:19) Baseline Comparison to Probes
(31:13) Ablation Test
(34:55) Jailbreaks
(38:52) Realistic Tests
(39:52) Limitations
(41:33) Future Work
(42:31) Conclusion

---

First published:
January 12th, 2026

Source:
https://www.lesswrong.com/posts/og7km7vmJ6Ktay9Ds/split-personality-training-revealing-latent-knowledge-1

---

Narrated by TYPE III AUDIO.

---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
...more
46min
January 12, 2026 “Thinking vs Unfolding” by Chris Scammell
Jake vs Boss
My friend Jake has a difficult boss. Well, kind-of-boss. They're technically co-founders, but the equity split, titles (CEO vs COO), and age/seniority difference put Jake in the junior position. It's been three years of grinding together on the startup, and this year-end, Jake's boss told him "Look, we've got a money problem. 2026 is going to be a challenging year financially and we need to buckle down. But also, I've worked really hard this year and after running this startup at way below what I think I should be earning for a few years, I'm going to give myself a raise."
"Okay, fine," thought Jake, "seems not great when we need to be frugal, but whatever."
But the next week, Jake saw the actual 2026 budget and his boss' raise, and his heart sank. Jake's boss had allocated himself 75% of the $100,000 reserved for raises. Jake would be getting nothing, and be responsible for allocating pennies to the rest of the team (Jake manages almost the entire headcount).
I was talking with Jake about this, and he said "I need to tell my boss how financially irresponsible this is, and how its a risk to [...]

---
Outline:
(00:09) Jake vs Boss
(01:33) Thinking vs Unfolding
(03:34) Jake vs Boss 2
(09:00) Jenns Misanthropy
(12:55) Holding Love and Comparison
(18:51) A Few Implications

The original text contained 3 footnotes which were omitted from this narration.
---

First published:
January 12th, 2026

Source:
https://www.lesswrong.com/posts/N7QtcqrN5hQwkjpGg/thinking-vs-unfolding

---

Narrated by TYPE III AUDIO.
...more
25min
January 12, 2026 “Announcing Inkhaven 2: April 2026” by Ben Pace
I have come to spread the good word: we're doing Inkhaven again, this April 1 – 30. You can apply on the website.
The cheers of the first cohort
Why are we doing another cohort?
Inkhaven activates people as bloggers/writers. We had 41 residents, and all of them completed the program of 30 posts in 30 days.[1] Of those 41, 30 have continued to submit blogposts to the Inkhaven slack since December 2nd, with those 30 publishing an average of 1 post per week since then.
But also because the month of first cohort of Inkhaven one was one of my favorite months of my life. I got to write, and be surrounded by writers that I respected.
What happened the first time?
As I say, people actually published. If we add in all the visiting writers and staff who also submitted posts to Inkhaven (e.g. I wrote for 20 continuous days of the 30), then here are some summary stats.
To be clear, some of the residents did more than their mandatory 30. One day Screwtape published 3 posts. Not to be outdone, local blogging maniac Michael Dickens published 10 (!) posts in a single day. And on the [...]

---
Outline:
(00:25) Why are we doing another cohort?
(01:06) What happened the first time?
(03:38) What else happened?
(04:23) How much did people like it?
(05:10) Did people get better as writers?
(05:53) What kind of writing did the people write?
(06:43) Why should I do this?
(07:21) Why should I not do this?
(08:00) Can I buy cute Inkhaven stickers?
(08:13) How much does Inkhaven cost?
(08:34) What is the application process?
(08:45) How can I apply or find our more info?

The original text contained 1 footnote which was omitted from this narration.
---

First published:
January 11th, 2026

Source:
https://www.lesswrong.com/posts/nwWfsPiaFSiEtHbkJ/announcing-inkhaven-2-april-2026

---

Narrated by TYPE III AUDIO.

---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
...more
10min
January 12, 2026 “Digital intentionality is not about productivity” by mingyuan
My friend Justis wrote a post this week on what his non-rationalist (“normal”) friends are like. He said:
Digital minimalism is well and good, and being intentional about devices is fine, but most normal people I know are perfectly fine with their level of YouTube, Instagram, etc. consumption. The idea of fretting about it intensely is just like… weird. Extra. Trying too hard. Because most people aren’t ultra-ambitious, and the opportunity cost of a few hours a day of mindless TV or video games or whatever just doesn’t really sting.
This seems 1) factually incorrect and 2) missing the point of everything.
First off, in my experience, worry about screen addiction doesn’t cleave along lines of ambition at all. Lots of people who aren’t particularly ambitious care about it, and lots of ambitious people unreflectively lose many hours a day to their devices.
Second, digital intentionality is about so much more than productivity. It's about living your life on purpose. It touches every part of life, because our devices touch every part of our lives. To say that people only care about their device use because it gets in the way of their ambitions is to misunderstand the value [...]

---
Outline:
(01:31) 'Normal' people do care about screen addiction
(02:44) Digital intentionality is value-neutral / not about productivity

---

First published:
January 11th, 2026

Source:
https://www.lesswrong.com/posts/HhoegdC8vxhGKsXEN/digital-intentionality-is-not-about-productivity

---

Narrated by TYPE III AUDIO.
...more
5min
January 12, 2026 “If AI alignment is only as hard as building the steam engine, then we likely still die” by MichaelDickens
Cross-posted from my website.
You may have seen this graph from Chris Olah illustrating a range of views on the difficulty of aligning superintelligent AI:
Evan Hubinger, an alignment team lead at Anthropic, says:
If the only thing that we have to do to solve alignment is train away easily detectable behavioral issues...then we are very much in the trivial/steam engine world. We could still fail, even in that world—and it’d be particularly embarrassing to fail that way; we should definitely make sure we don’t—but I think we’re very much up to that challenge and I don’t expect us to fail there.
I disagree; if governments and AI developers don't start taking extinction risk more seriously, then we are not up to the challenge.
Thomas Savery patented the first commercial steam pump in 1698.
[1]
The device used fire to heat up a boiler full of steam, which would then be cooled to create a partial vacuum and draw water out of a well. Savery's pump had various problems, and eventually Savery gave up on trying to improve it. Future inventors improved upon the design to make it practical.
It was not until [...]

---
Outline:
(05:32) What if you use the aligned human-level AI to figure out how to align the ASI?
(06:41) What if alignment techniques on weaker AIs generalize to superintelligence?

The original text contained 5 footnotes which were omitted from this narration.
---

First published:
January 10th, 2026

Source:
https://www.lesswrong.com/posts/WkEAcTNHHHk97nT4d/if-ai-alignment-is-only-as-hard-as-building-the-steam-engine

---

Narrated by TYPE III AUDIO.

---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
...more
9min
January 12, 2026“A Couple Useful LessWrong Userstyles” by Alex Vermillion
As a weirdo, I like to read LessWrong sometimes. There are a few extremely tiny features that I wish the site had that it doesn't. Luckily enough, I know how webpages work, and certain kinds of tweaks are especially easy. I'm attaching two of these here now, and may return to add more later.
Current contents:
LessWrong Vote Hider — Hides all the vote totals on a posts page. Unhides when you hover them.
LessWrong Vote Floater — Makes it easier to vote while you're reading by taking the vote icon from the bottom and making it float on the bottom-right of the page.
How Do I Use These?
You're going to need the ability to inject CSS onto webpages
[1]
. I use Stylus for Firefox, but any mainstream browser is going to have some add-on for this. There appears to be a version of Stylus for Chrome for example.
Userstyles come in a few forms, but mine are going to be dumb CSS, which removes a lot of the need to explain anything.
First, create a userstyle to paste the code into:
In Stylus, you can [...]

---
Outline:
(00:49) How Do I Use These?
(02:04) Snippets
(02:07) LessWrong Vote Hider
(02:14) LessWrong Vote Floater

The original text contained 1 footnote which was omitted from this narration.
---

First published:
January 11th, 2026

Source:
https://www.lesswrong.com/posts/mQpMqZxJcF2DspyRD/a-couple-useful-lesswrong-userstyles

---

Narrated by TYPE III AUDIO.

---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
...more
3min
January 11, 2026 “Strong, bipartisan leadership for resistance to Trump.” by Raemon
I am not currently Trying For Real to do anything about the Trump Administration. If I were, I'd be focused on finding and empowering a strong opposition leadership with bipartisan support.
It's in the top 7 things I consider dedicating this year to, maybe in the top 4. I could be persuaded to make it my #1 priority. Thing seem pretty bad. The three reasons I'm not currently prioritizing it are:
1. I don't currently see an inroad to really helping
2. Figuring out what to do and upskilling into it would be a big endeavor.
3. AI is just also very important and much more neglected (i.e. ~half the country is aware that Trump is bad and out of control, a much teenier fraction understand that the world is about to get steamrolled by AI)[1]
My top priority, if I were getting more involved, would be trying to find and empower someone who is, like, the actual executive leader of the Trump Opposition (and ideally finding a coalition of leaders that include republicans, probably ex-Trump-staffers who have already taken the hit of getting kicked out of the administration)
The scariest thing about what's happening is how fast things [...]

The original text contained 1 footnote which was omitted from this narration.
---

First published:
January 11th, 2026

Source:
https://www.lesswrong.com/posts/qgtvRcGHswbRtKfKQ/strong-bipartisan-leadership-for-resistance-to-trump

---

Narrated by TYPE III AUDIO.
...more
5min

FAQs about LessWrong (30+ Karma):

How many episodes does LessWrong (30+ Karma) have?

The podcast currently has 3,906 episodes available.

More shows like LessWrong (30+ Karma)

The Daily by The New York Times

The Daily

112,309 Listeners

Astral Codex Ten Podcast by Jeremiah

Astral Codex Ten Podcast

130 Listeners

Interesting Times with Ross Douthat by New York Times Opinion

Interesting Times with Ross Douthat

7,241 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

559 Listeners

The Ezra Klein Show by New York Times Opinion

The Ezra Klein Show

16,305 Listeners

AI Article Readings by Readings of great articles in AI voices

AI Article Readings

4 Listeners

Doom Debates! by Liron Shapira

Doom Debates!

14 Listeners

LessWrong posts by zvi by zvi

LessWrong posts by zvi

2 Listeners