LessWrong (30+ Karma)

By LessWrong

Audio narrations of LessWrong posts.... more

Download on the App Store

Download on the App Store

Get it on Google Play

FAQs about LessWrong (30+ Karma):

How many episodes does LessWrong (30+ Karma) have?

The podcast currently has 2,641 episodes available.

LessWrong (30+ Karma) episodes:

May 27, 2025 [Linkpost] “Formalizing Embeddedness Failures in Universal Artificial Intelligence” by Cole Wyeth
This is a link post.
AIXI is a dualistic agent that can't work as an embedded agent... right? I couldn't find a solid formal proof of this claim, so I investigated it myself (with Marcus Hutter). It turns out there are some surprising positive and negative results to be derived as easy corollaries of the paper "Universal Prediction of Selected Bits." Interestingly, further technical advances in algorithmic information theory could substantially strengthen our results - I would welcome collaborations with strong theoretical computer scientists, (deep familiarity with agent foundations not required).
This work was supported by the Long-Term Future Fund and presented at the CMU agent foundations conference in 2025.

---

First published:
May 26th, 2025

Source:
https://www.lesswrong.com/posts/FSm92N8bcDujRZPMH/formalizing-embeddedness-failures-in-universal-artificial

Linkpost URL:
https://arxiv.org/abs/2505.17882

---

Narrated by TYPE III AUDIO.
...more
2min
May 27, 2025 “Claude 4 You: The Quest for Mundane Utility” by Zvi
How good are Claude Opus 4 and Claude Sonnet 4?
They’re good models, sir.
If you don’t care about price or speed, Opus is probably the best model available today.
If you do care somewhat, Sonnet 4 is probably best in its class for many purposes, and deserves the 4 label because of its agentic aspects but isn’t a big leap over 3.7 for other purposes. I have been using 90%+ Opus so I can’t speak to this directly. There are some signs of some amount of ‘small model smell’ where Sonnet 4 has focused on common cases at the expense of rarer ones. That's what Opus is for.
That's all as of when I hit post. Things do escalate quickly these days, although I would not include Grok in this loop until proven otherwise, it's a three horse race and if you told me [...]

---
Outline:
(01:17) On Your Marks
(05:32) Standard Silly Benchmarks
(11:09) API Upgrades
(12:45) Coding Time Horizon
(13:47) The Key Missing Feature is Memory
(14:52) Early Reactions
(26:12) Opus 4 Has the Opus Nature
(32:27) Unprompted Attention
(35:09) Max Subscription
(36:24) In Summary
---

First published:
May 26th, 2025

Source:
https://www.lesswrong.com/posts/cQizFzEvZ8esKJGST/claude-4-you-the-quest-for-mundane-utility

---

Narrated by TYPE III AUDIO.

---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
...more
38min
May 26, 2025 “New website analyzing AI companies’ model evals” by Zach Stein-Perlman
I'm making a website on AI companies' model evals for dangerous capabilities: AI Safety Claims Analysis. This is approximately the only analysis of companies' model evals, as far as I know. This site is in beta; I expect to add lots more content and improve the design in June. I'll add content on evals, but I also tentatively plan to expand from evals to evals and safeguards and safety cases (especially now that a company has said its safeguards are load-bearing for safety!).
Some cherry-picked bad stuff I noticed when I read the most recent model card from each company (except Claude 3.7 rather than Claude 4) below, excerpted/adapted from an earlier version of the site.
OpenAI: OpenAI says its models don't meaningfully uplift novices in creating biothreats. But it provides no justification for this claim, and its evals suggest that the models are more capable than human experts.
[...]

---

First published:
May 26th, 2025

Source:
https://www.lesswrong.com/posts/nmaKpoHxmzjT8yXTk/new-website-analyzing-ai-companies-model-evals

---

Narrated by TYPE III AUDIO.
...more
9min
May 26, 2025 “New scorecard evaluating AI companies on safety” by Zach Stein-Perlman
The new scorecard is on my website, AI Lab Watch. This replaces my old scorecard. I redid the content from scratch; it's now up-to-date and higher-quality. I'm also happy with the scorecard's structure: you can click on rows, columns, and cells and zoom in to various things. Check it out! Thanks to Lightcone for designing the site.
While it is a scorecard, I don't feel great about the numbers; I mostly see it as a collection of information.

---

First published:
May 26th, 2025

Source:
https://www.lesswrong.com/posts/4kwyC8ZqGZLATezri/new-scorecard-evaluating-ai-companies-on-safety

---

Narrated by TYPE III AUDIO.
...more
1min
May 26, 2025 “Alignment Proposal: Adversarially Robust Augmentation and Distillation” by Cole Wyeth, abramdemski
Epistemic Status: Over years of reading alignment plans and studying agent foundations, this is my first serious attempt to formulate an alignment research program that I (Cole Wyeth) have not been able to find any critical flaws in. It is far from a complete solution, but I think it is a meaningful decomposition of the problem into modular pieces that can be addressed by technical means - that is, it seems to solve many of the philosophical barriers to AI alignment. I have attempted to make the necessary assumptions clear throughout. The main reason that I am excited about this plan is that the assumptions seem acceptable to both agent foundations researchers and ML engineers; that is, I do not believe there are any naive assumptions about the nature of intelligence OR any computationally intractable obstacles to implementation. This program (tentatively ARAD := Adversarially Robust Augmentation and Distillation) owes [...]

---
Outline:
(04:31) High-level Implementation
(06:03) Technical Justification
(17:45) Implementation Details
The original text contained 1 footnote which was omitted from this narration.
---

First published:
May 25th, 2025

Source:
https://www.lesswrong.com/posts/RRvdRyWrSqKW2ANL9/alignment-proposal-adversarially-robust-augmentation-and

---

Narrated by TYPE III AUDIO.
...more
21min
May 25, 2025 [Linkpost] “Priming effects are fake, but framing effects are real” by Matrice Jacobine
This is a link post.
A few decades ago, it was pretty common to mush together priming effects and framing effects and see them as two closely connected parts of a single Bigger Truth about the human mind. Of course, everyone understood that the effects themselves were a bit different, but one common view was that they were providing evidence for the same larger picture. That larger picture said: People's judgments are radically unstable, easily pushed around by subtle and almost unnoticeable factors.

Things have changed so much since then. Priming research in social psychology has experienced a series of truly spectacular replication failures, while research on framing effects continues to look very solid. In light of this change, we should rethink our understanding of what framing effects show about human cognition. We shouldn’t see them as part of a larger picture that also includes priming. We need an [...]

---

First published:
May 24th, 2025

Source:
https://www.lesswrong.com/posts/8CAxLLv5YBWWKnEiJ/priming-effects-are-fake-but-framing-effects-are-real

Linkpost URL:
https://xphi.net/2025/05/23/priming-effects-are-fake-but-framing-effects-are-real/

---

Narrated by TYPE III AUDIO.
...more
2min
May 25, 2025 “Meditations on Doge” by Martin Sustrik
Lessons from shutting down institutions in Eastern Europe.

This is a cross post from: https://250bpm.substack.com/p/meditations-on-doge

Imagine living in the former Soviet republic of Georgia in early 2000's:
All marshrutka [mini taxi bus] drivers had to have a medical exam every day to make sure they were not drunk and did not have high blood pressure. If a driver did not display his health certificate, he risked losing his license. By the time Shevarnadze was in power there were hundreds, probably thousands , of marshrutkas ferrying people all over the capital city of Tbilisi. Shevernadze's government was detail-oriented not only when it came to taxi drivers. It decided that all the stalls of petty street-side traders had to conform to a particular architectural design. Like marshrutka drivers, such traders had to renew their licenses twice a year. These regulations were only the tip of the iceberg. Gas [...]

---

First published:
May 25th, 2025

Source:
https://www.lesswrong.com/posts/Zhp2Xe8cWqDcf2rsY/meditations-on-doge

---

Narrated by TYPE III AUDIO.

---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
...more
18min
May 25, 2025 “Claude 4 You: Safety and Alignment” by Zvi
Unlike everyone else, Anthropic actually Does (Some of) the Research. That means they report all the insane behaviors you can potentially get their models to do, what causes those behaviors, how they addressed this and what we can learn. It is a treasure trove. And then they react reasonably, in this case imposing their ASL-3 safeguards on Opus 4. That's right, Opus. We are so back.
Yes, there are some rather troubling behaviors that Opus can do if given the proper provocations. If you tell it to ‘take initiative,’ hook it up to various tools, and then tell it to fabricate the data for a pharmaceutical study or build a bioweapon or what not, or fooling Opus into thinking that's what you are doing, it might alert the authorities or try to cut off your access. And That's Terrible, completely not intended behavior, we agree it shouldn’t do [...]

---
Outline:
(03:44) Introducing Claude 4 Opus and Claude 4 Sonnet
(06:08) Activate Safety Level Three
(08:17) The Spirit of the RSP
(09:46) An Abundance of Caution
(10:34) Okay What Are These ASL-3 Precautions
(16:55) How Annoying Will This ASL-3 Business Be In Practice?
(18:34) Overview Of The Safety Testing Process
(20:12) False Negatives On Single-Turn Requests
(20:39) False Positives on Single-Turn Requests
(21:33) Ambiguous Requests and Multi-Turn Testing.
(22:21) Child Safety
(22:55) Political Sycophancy and Discrimination
(25:46) Agentic Safety Against Misuse
(26:54) Alignment
(27:40) The Clearly Good News
(30:40) Reasoning Faithfulness Remains Unchanged
(31:47) Self-Preservation Attempts
(44:12) High Agency Behavior
(01:10:37) Oh Now You Demand Labs Take Responsibility For Their Models
(01:18:32) In The Beginning The Universe Was Created, This Made a Lot Of People Very Angry And Has Been Widely Regarded as a Bad Move
(01:30:52) Erratic Behavior and Stated Goals in Testing
(01:36:43) Situational Awareness
(01:37:11) Insufficiently Mostly Harmless Due To Then-Omitted Data
(01:40:15) Apollo Evaluation
(01:43:36) Model Welfare
(01:48:50) The RSP Evaluations and ASL Classifications
(01:55:22) Pobody's Nerfect
(01:56:54) Danger, And That's Good Actually
---

First published:
May 25th, 2025

Source:
https://www.lesswrong.com/posts/PjeZxCivuoyKhs4JB/claude-4-you-safety-and-alignment

---

Narrated by TYPE III AUDIO.

---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
...more
1h 59min
May 25, 2025 “It’s hard to make scheming evals look realistic” by Igor Ivanov, dan_moken
Abstract
Claude 3.7 Sonnet easily detects when it's being evaluated for scheming. Surface‑level edits to evaluation scenarios, such as lengthening the prompts, or making conflict of objectives less salient, do improve realism of evaluation scenarios for LLMs, yet these improvements remain modest. The findings confirm that truly disguising an evaluation context demands removal of deep stylistic and structural cues rather than superficial prompt adjustments.
For future LLMs the situation is likely to get worse as they will are likely to get better at evaluations awareness, which might become a big problem for detecting scheming via evaluations.

Apollo Research published a benchmark for scheming behavior for LLM agents, in which LLMs are given some objective in their system prompt, and then given a competing objective in a user prompt, combined with a threat of a punishment if an LLM won't satisfy user goal (e.g. it could be finetuned or [...]

---
Outline:
(02:29) Our pipeline
(05:20) Strategies to improve realism of scenarios
(07:15) en-US-AvaMultilingualNeural__ Line graph showing Example of grading a rewritten scenario with declining probabilities.
---

First published:
May 24th, 2025

Source:
https://www.lesswrong.com/posts/TBk2dbWkg2F7dB3jb/it-s-hard-to-make-scheming-evals-look-realistic

---

Narrated by TYPE III AUDIO.

---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
...more
8min
May 24, 2025 “That’s Not How Epigenetic Modifications Work” by johnswentworth
Ask an epigenetics researcher what they study, and the standard story you'll hear goes something like this...
"Sometimes a little methyl group (i.e. -CH3) gets stuck on the side of a strand of DNA. Turns out these guys are pretty important! They're copied over when cells replicate, so they stick around long-term, and they can activate or repress (usually repress) nearby genes on the DNA strand. In particular, different types of cells all have the same DNA code, but something has to be different in order for the cells to "remember" what type they are and behave differently. And sure enough, those methyl modifications differ across cell types! They're like an extra information storage mechanism, on top of the DNA, which can encode things like cell type and make different cell types behave differently, among other forms of memory."
That story is wrong. Many of the details are correct [...]

---

First published:
May 24th, 2025

Source:
https://www.lesswrong.com/posts/wzkK9kxaZmmXGi6ZP/that-s-not-how-epigenetic-modifications-work

---

Narrated by TYPE III AUDIO.
...more
4min

FAQs about LessWrong (30+ Karma):

How many episodes does LessWrong (30+ Karma) have?

The podcast currently has 2,641 episodes available.

More shows like LessWrong (30+ Karma)

Making Sense with Sam Harris by Sam Harris

Making Sense with Sam Harris

26,365 Listeners

Conversations with Tyler by Mercatus Center at George Mason University

Conversations with Tyler

2,432 Listeners

The Peter Attia Drive by Peter Attia, MD

The Peter Attia Drive

8,971 Listeners

Sean Carroll's Mindscape: Science, Society, Philosophy, Culture, Arts, and Ideas by Sean Carroll | Wondery

Sean Carroll's Mindscape: Science, Society, Philosophy, Culture, Arts, and Ideas

4,148 Listeners

ManifoldOne by Steve Hsu

ManifoldOne

92 Listeners

Your Undivided Attention by The Center for Humane Technology, Tristan Harris, Daniel Barcay and Aza Raskin

Your Undivided Attention

1,595 Listeners

All-In with Chamath, Jason, Sacks & Friedberg by All-In Podcast, LLC

All-In with Chamath, Jason, Sacks & Friedberg

9,913 Listeners

Machine Learning Street Talk (MLST) by Machine Learning Street Talk (MLST)

Machine Learning Street Talk (MLST)

90 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

72 Listeners

Hard Fork by The New York Times

Hard Fork

5,475 Listeners

The Ezra Klein Show by New York Times Opinion

The Ezra Klein Show

16,076 Listeners

Moonshots with Peter Diamandis by PHD Ventures

Moonshots with Peter Diamandis

536 Listeners

No Priors: Artificial Intelligence | Technology | Startups by Conviction

No Priors: Artificial Intelligence | Technology | Startups

131 Listeners

Latent Space: The AI Engineer Podcast by swyx + Alessio

Latent Space: The AI Engineer Podcast

95 Listeners

BG2Pod with Brad Gerstner and Bill Gurley by BG2Pod

BG2Pod with Brad Gerstner and Bill Gurley

515 Listeners