LessWrong (30+ Karma)

By LessWrong

Audio narrations of LessWrong posts.... more

Download on the App Store

Download on the App Store

Get it on Google Play

FAQs about LessWrong (30+ Karma):

How many episodes does LessWrong (30+ Karma) have?

The podcast currently has 2,012 episodes available.

LessWrong (30+ Karma) episodes:

April 04, 2024“Run evals on base models too!” by orthonormal
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.
(Creating more visibility for a comment thread with Rohin Shah.)
Currently, DeepMind's capabilities evals are run on the post-RL*F (RLHF/RLAIF) models and not on the base models. This worries me because RL*F will train a base model to stop displaying capabilities, but this isn't a guarantee that it trains the model out of having the capabilities.
Consider by analogy using RLHF on a chess-playing AI, where the trainers reward it for putting up a good fight and making the trainer work hard to win, but punish it for ever beating the trainer. There are two things to point out about this example:

Running a simple eval on the post-RLHF model would reveal a much lower ELO than if you ran it on the base model, because it would generally find a way to lose. (In this [...]

---
First published:
April 4th, 2024

Source:
https://www.lesswrong.com/posts/dgFC394qZHgj2cWAg/run-evals-on-base-models-too
---

Narrated by TYPE III AUDIO.
...more
3min
April 04, 2024 “LLMs for Alignment Research: a safety priority?” by abramdemski
A recent short story by Gabriel Mukobi illustrates a near-term scenario where things go bad because new developments in LLMs allow LLMs to accelerate capabilities research without a correspondingly large acceleration in safety research.
This scenario is disturbingly close to the situation we already find ourselves in. Asking the best LLMs for help with programming vs technical alignment research feels very different (at least to me). LLMs might generate junk code, but you can keep pointing out the problems with the code, and the code will eventually work. This can be faster than doing it myself, in cases where I don't know a language or library well; the LLMs are moderately familiar with everything.
When I try to talk to LLMs about technical AI safety work, however, I just get garbage.
I think a useful safety precaution for frontier AI models would be to make them more useful for [...]

---
Outline:
(01:19) What is wrong with current models?
(04:08) Is there any hope for better?
(06:12) Against Autonomy
(06:32) Servant vs Collaborator
(09:00) Notions of Alignment
(11:11) The Need for an Independent Safety Approach
(12:29) Engineer-Researcher Collaboration
(13:58) Aside: Hallucinations
(15:58) Updating (on relatively small amounts of text)
(18:52) Feedback Tools
The original text contained 8 footnotes which were omitted from this narration.
---

First published:
April 4th, 2024

Source:
https://www.lesswrong.com/posts/nQwbDPgYvAbqAmAud/llms-for-alignment-research-a-safety-priority
---

Narrated by TYPE III AUDIO.
...more
21min
April 04, 2024“A gentle introduction to mechanistic anomaly detection” by Erik Jenner
TL;DR: Mechanistic anomaly detection aims to flag when an AI produces outputs for “unusual reasons.” It is similar to mechanistic interpretability but doesn’t demand human understanding. I give a self-contained introduction to mechanistic anomaly detection from a slightly different angle than the existing one by Paul Christiano (focused less on heuristic arguments and drawing a more explicit parallel to interpretability).
Mechanistic anomaly detection was first introduced by the Alignment Research Center (ARC), and a lot of this post is based on their ideas. However, I am not affiliated with ARC; this post represents my perspective.
Introduction.

We want to create useful AI systems that never do anything too bad. Mechanistic anomaly detection relaxes this goal in two big ways:

Instead of eliminating all bad behavior from the start, we’re just aiming to flag AI outputs online.
Instead of specifically flagging bad outputs, we flag any outputs that the AI [...]

---
Outline:
(03:31) Mechanistic anomaly detection as an alternative to interpretability: a toy example
(10:31) General setup
(12:51) Methods for mechanistic anomaly detection
(15:05) Stories on applying MAD to AI safety
(16:09) Auditing an AI company
(18:26) Coordinated AI coup
---
First published:
April 3rd, 2024

Source:
https://www.lesswrong.com/posts/n7DFwtJvCzkuKmtbG/a-gentle-introduction-to-mechanistic-anomaly-detection
---

Narrated by TYPE III AUDIO.
...more
21min
April 04, 2024 “Best in Class Life Improvement” by sapphire
There is an enormous amount of crappy self-help advice. Most supplements do nothing. However, some substances and practices can dramatically improve your life. It's worth being explicit about what those are in my experience.
The American medical system endorses all of these treatments and methods, and you can implement them with a doctor's supervision. The only way I differ from the American medical system is that they operate under a paradigm of treating diseases or perhaps what might be better understood as serious deficiencies. But if a technique is powerful enough to help the ill it is plausible it can also help the well. Make your own choices and set yourself free. Before reading this advice, it is important to note that drug users use a lot of drugs.
In general, recreational drug users take their drugs at doses so much higher than psychiatric patients that they're basically two [...]

---
Outline:
(01:25) Ketamine
(05:41) Adderall and other Amphetamines
(08:01) Exercise Regularly and Safely – You can get huge benefits with little effort or risk
(11:18) Semaglutide/Ozempic
(13:42) Testosterone
(15:48) Interlude: Buddhism, Meditation and Psychedelics
(32:22) Appendix
(33:01) Drug Testing Resources:
---

First published:
April 4th, 2024

Source:
https://www.lesswrong.com/posts/PLubzz4Jpm4Pas6nT/best-in-class-life-improvement
---

Narrated by TYPE III AUDIO.
...more
34min
April 03, 2024 “Sparsify: A mechanistic interpretability research agenda” by Lee Sharkey
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.
Over the last couple of years, mechanistic interpretability has seen substantial progress. Part of this progress has been enabled by the identification of superposition as a key barrier to understanding neural networks (Elhage et al., 2022) and the identification of sparse autoencoders as a solution to superposition (Sharkey et al., 2022; Cunningham et al., 2023; Bricken et al., 2023).
From our current vantage point, I think there's a relatively clear roadmap toward a world where mechanistic interpretability is useful for safety. This post outlines my views on what progress in mechanistic interpretability looks like and what I think is achievable by the field in the next 2+ years. It represents a rough outline of what I plan to work on in the near future.
My thinking and work is, of course, very heavily inspired by the [...]

---
Outline:
(01:33) Key frameworks for understanding the agenda
(01:38) Framework 1: The three steps of mechanistic interpretability
(03:57) Framework 2: The description accuracy vs. description length tradeoff
(07:54) The unreasonable effectiveness of SAEs for mechanistic interpretability
(10:38) Framework 3: Big data-driven science vs. Hypothesis-driven science
(15:14) Sparsify: The Agenda
(17:33) Objective 1: Improving SAEs
(17:57) Benchmarking SAEs
(18:19) Fixing SAE pathologies
(20:46) Applying SAEs to attention
(22:40) Better hyperparameter selection methods
(23:21) Computationally efficient sparse coding
(24:39) Objective 2: Decompiled networks
(27:28) Policy goals for network decompilation
(29:17) Objective 3: Abstraction above raw decompilations
(31:41) Objective 4: Deep Description
(35:23) A sketch of an automated process for deep description: The Iterative-Forward-Backwards procedure
(38:30) Objective 5: Mechanistic interpretability-based evals and other applications of mechanistic interpretability
The original text contained 4 footnotes which were omitted from this narration.
---

First published:
April 3rd, 2024

Source:
https://www.lesswrong.com/posts/64MizJXzyvrYpeKqm/sparsify-a-mechanistic-interpretability-research-agenda
---

Narrated by TYPE III AUDIO.
...more
47min
April 02, 2024 “Notes on Dwarkesh Patel’s Podcast with Sholto Douglas and Trenton Bricken” by Zvi
Dwarkesh Patel continues to be on fire, and the podcast notes format seems like a success, so we are back once again.
This time the topic is how LLMs are trained, work and will work in the future. Timestamps are for YouTube. Where I inject my own opinions or takes, I do my best to make that explicit and clear.
This was highly technical compared to the average podcast I listen to, or that Dwarkesh does. This podcast definitely threated to technically go over my head at times, and some details definitely did go over my head outright. I still learned a ton, and expect you will too if you pay attention.
This is an attempt to distill what I found valuable, and what questions I found most interesting. I did my best to make it intuitive to follow even if you are not technical, but [...]

---

First published:
April 1st, 2024

Source:
https://www.lesswrong.com/posts/dBueknepD4rhuEcmb/notes-on-dwarkesh-patel-s-podcast-with-sholto-douglas-and
---

Narrated by TYPE III AUDIO.
...more
33min
April 02, 2024 “Thousands of malicious actors on the future of AI misuse” by Zershaaneh Qureshi, Corin Katzke, Convergence Analysis
Announcing the results of a 2024 survey by Convergence Analysis. We’ve just posted the executive summary below, but you can read the full report here.
In the largest survey of its kind, Convergence Analysis surveyed 2,779 malicious actors on how they would misuse AI to catastrophic ends.
In previous work, we’ve explored the difficulty of forecasting AI risk. Existing attempts rely almost exclusively on data from AI experts and professional forecasters. As a result, the perspectives of perhaps the most important actors in AI risk – malicious actors – are underrepresented in current AI safety discourse. This report aims to fill that gap.
Methodology
We selected malicious actors based on whether they would hypothetically end up in "the bad place" in the TV show, The Good Place. This list included members of US-designated terrorist groups, convicted war criminals, and anyone who has ever appeared on Love Island [...]

---
Outline:
(00:53) Methodology
(01:13) Results
---

First published:
April 1st, 2024

Source:
https://www.lesswrong.com/posts/xbZ3BSAAjGCayGmyo/thousands-of-malicious-actors-on-the-future-of-ai-misuse
---

Narrated by TYPE III AUDIO.
...more
4min
April 02, 2024 “Gradient Descent on the Human Brain” by Jozdien, gaspode
TL;DR: Many alignment research proposals often share a common motif: figure out how to enter a basin of alignment / corrigibility for human-level models, and then amplify to more powerful regimes while generalizing gracefully. In this post we lay out a research agenda that comes at this problem from a different direction: if we already have ~human-level systems with extremely robust generalization properties, we should just amplify those directly. We’ll call this strategy “Gradient Descent on the Human Brain”.
Introduction.

Put one way, the hard part of the alignment problem is figuring out how to solve ontology identification: mapping between an AI's model of the world and a human's model, in order to translate and specify human goals in an alien ontology.
In generality, in the worst case, this is a pretty difficult problem. But is solving this problem necessary to create safe superintelligences? The assumption that you [...]

---
Outline:
(01:47) The Setup
(03:38) Aside: Whose brains should we use for this?
(04:07) Potential Directions
(04:10) More sophisticated methods
(04:27) Outreach
(04:46) Alternate optimization methods
(05:01) Appendix
(05:13) Toy examples
(06:57) How to run gradient descent on the human brain (longer version)
(09:25) Neural gradient descent: Organoid edition
(13:33) A more advanced sketch
The original text contained 3 footnotes which were omitted from this narration.
---

First published:
April 1st, 2024

Source:
https://www.lesswrong.com/posts/FGvN7aKgdmsTqJ6qF/gradient-descent-on-the-human-brain
---

Narrated by TYPE III AUDIO.
...more
18min
April 02, 2024 “Coherence of Caches and Agents” by johnswentworth
There's a lot of confusion about what coherence means for agents, and what "coherence theorems" do and don't say about agents. In this post, I'll talk about some particularly simple notions of coherence in a particularly simple setting. We'll see what nontrivial things coherence has to say, at least in a simple kind of environment, starting with an analogous notion of coherence for caches.
What Kind Of "Coherence" We're Talking About Here
Let's start with a standard CS-101-style example. We write a recursive python function to compute fibonacci numbers:
def fib(n):
if n == 0:
result = 1
elif n == 1:
result = 1
else:
result = fib(n-1) + fib(n-2)
return result
We pass in n = 0, then n = 1, then 2, then 3, etc. It spits out 1, 1, 2, 3, 5, 8, .... Great. Buuuuut it gets very slow very quickly as n increases; the [...]

---
Outline:
(00:28) What Kind Of Coherence Were Talking About Here
(04:52) Value Cache
(09:57) Generalization: Agents Value Function/Cache
(11:49) Coherence Is Not Utility-Dependent
(14:23) Coherence of Policies
(19:43) Summary and Takeaways
---

First published:
April 1st, 2024

Source:
https://www.lesswrong.com/posts/wjFijaAkSCceqCgGF/coherence-of-caches-and-agents
---

Narrated by TYPE III AUDIO.
...more
23min
April 01, 2024 “Announcing Suffering For Good” by Garrett Baker
TL;DR: We are excited to announce the new animal welfare organization Suffering For Good, a new factory farming charity aimed at vegans, where we use our excess profits to buy suffering offsets--in particular, an enormous number of rats on heroin.
For decades, even centuries, us vegans have been trying but failing to get the world to stop eating & torturing sentient minds that can definitely feel pain & suffer. But the global number of such minds tortured & killed just keeps on increasing. We at Suffering for Good think its time we just gave up, and ask ourselves "how can we use this to our advantage?"
We realized something when we asked that question. After decades of fighting this fight, we know far more about the factory farming industry than virtually anyone inside that industry. In that period of learning, and attempted dismantling, we had learned basically all the [...]

---

First published:
April 1st, 2024

Source:
https://www.lesswrong.com/posts/fZgWatNeTvK6FFdtW/announcing-suffering-for-good
---

Narrated by TYPE III AUDIO.
...more
3min

FAQs about LessWrong (30+ Karma):

How many episodes does LessWrong (30+ Karma) have?

The podcast currently has 2,012 episodes available.

More shows like LessWrong (30+ Karma)

Making Sense with Sam Harris by Sam Harris

Making Sense with Sam Harris

26,446 Listeners

Conversations with Tyler by Mercatus Center at George Mason University

Conversations with Tyler

2,389 Listeners

The Peter Attia Drive by Peter Attia, MD

The Peter Attia Drive

7,910 Listeners

Sean Carroll's Mindscape: Science, Society, Philosophy, Culture, Arts, and Ideas by Sean Carroll | Wondery

Sean Carroll's Mindscape: Science, Society, Philosophy, Culture, Arts, and Ideas

4,136 Listeners

ManifoldOne by Steve Hsu

ManifoldOne

87 Listeners

Your Undivided Attention by Tristan Harris and Aza Raskin, The Center for Humane Technology

Your Undivided Attention

1,462 Listeners

All-In with Chamath, Jason, Sacks & Friedberg by All-In Podcast, LLC

All-In with Chamath, Jason, Sacks & Friedberg

9,095 Listeners

Machine Learning Street Talk (MLST) by Machine Learning Street Talk (MLST)

Machine Learning Street Talk (MLST)

87 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

389 Listeners

Hard Fork by The New York Times

Hard Fork

5,432 Listeners

The Ezra Klein Show by New York Times Opinion

The Ezra Klein Show

15,174 Listeners

Moonshots with Peter Diamandis by PHD Ventures

Moonshots with Peter Diamandis

474 Listeners

No Priors: Artificial Intelligence | Technology | Startups by Conviction

No Priors: Artificial Intelligence | Technology | Startups

121 Listeners

Latent Space: The AI Engineer Podcast by swyx + Alessio

Latent Space: The AI Engineer Podcast

75 Listeners

BG2Pod with Brad Gerstner and Bill Gurley by BG2Pod

BG2Pod with Brad Gerstner and Bill Gurley

459 Listeners