LessWrong (30+ Karma)

By LessWrong

Audio narrations of LessWrong posts.... more

Download on the App Store

Download on the App Store

Get it on Google Play

FAQs about LessWrong (30+ Karma):

How many episodes does LessWrong (30+ Karma) have?

The podcast currently has 2,001 episodes available.

LessWrong (30+ Karma) episodes:

May 29, 2024 “MIRI 2024 Communications Strategy” by Gretta Duleba
As we explained in our MIRI 2024 Mission and Strategy update, MIRI has pivoted to prioritize policy, communications, and technical governance research over technical alignment research. This follow-up post goes into detail about our communications strategy.
The Objective: Shut it Down[1]
Our objective is to convince major powers to shut down the development of frontier AI systems worldwide before it is too late. We believe that nothing less than this will prevent future misaligned smarter-than-human AI systems from destroying humanity. Persuading governments worldwide to take sufficiently drastic action will not be easy, but we believe this is the most viable path.
Policymakers deal mostly in compromise: they form coalitions by giving a little here to gain a little somewhere else. We are concerned that most legislation intended to keep humanity alive will go through the usual political processes and be ground down into ineffective compromises.
The only way we [...]

---
Outline:
(00:22) The Objective: Shut it Down
(02:00) Theory of Change
(02:33) Audience
(04:16) Message and Tone
(06:42) Channels
(08:25) Artifacts
(09:15) What We’re Not Doing
(10:39) Execution
(11:56) How to Help
The original text contained 2 footnotes which were omitted from this narration.
---

First published:
May 29th, 2024

Source:
https://www.lesswrong.com/posts/tKk37BFkMzchtZThx/miri-2024-communications-strategy
---

Narrated by TYPE III AUDIO.
...more
14min
May 29, 2024 “Apollo Research 1-year update” by Marius Hobbhahn, Lee Sharkey, Lucius Bushnaq, Dan Braun, Mikita Balesni, Jérémy Scheurer, Nicholas Goldowsky-Dill, StefanHex, jake_mendel, AlexMeinke, rusheb
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.
This is a linkpost for: www.apolloresearch.ai/blog/the-first-year-of-apollo-research
About Apollo Research
Apollo Research is an evaluation organization focusing on risks from deceptively aligned AI systems. We conduct technical research on AI model evaluations and interpretability and have a small AI governance team. As of 29 May 2024, we are one year old.
Executive Summary
For the UK AI Safety Summit, we developed a demonstration that Large Language Models (LLMs) can strategically deceive their primary users when put under pressure. The accompanying paper was referenced by experts and the press (e.g. AI Insight forum, BBC, Bloomberg) and accepted for oral presentation at the ICLR LLM agents workshop.
The evaluations team is currently working on capability evaluations for precursors of deceptive alignment, scheming model organisms, and a responsible scaling policy (RSP) on deceptive alignment. Our goal is to help governments and [...]

---
Outline:
(00:22) About Apollo Research
(00:44) Executive Summary
(03:01) Completed work
(03:04) Evaluations
(05:17) Interpretability
(07:21) Governance
(09:09) Current and Future work
(09:13) Evaluations
(10:36) Interpretability
(11:27) Governance
(12:35) Operational Highlights
(13:23) Challenges
(15:55) Forward Look
---

First published:
May 29th, 2024

Source:
https://www.lesswrong.com/posts/qK79p9xMxNaKLPuog/apollo-research-1-year-update
---

Narrated by TYPE III AUDIO.
...more
17min
May 28, 2024 “When Are Circular Definitions A Problem?” by johnswentworth
Disclaimer: if you are using a definition in a nonmathematical piece of writing, you are probably making a mistake; you should just get rid of the definition and instead use a few examples. This applies double to people who think they are being "rigorous" by defining things but are not actually doing any math. Nonetheless, definitions are still useful and necessary when one is ready to do math, and some pre-formal conceptual work is often needed to figure out which mathematical definitions to use; thus the usefulness of this post.
Suppose I’m negotiating with a landlord about a pet, and in the process I ask the landlord what counts as a “big dog”. The landlord replies “Well, any dog that's not small”. I ask what counts as a “small dog”. The landlord replies “Any dog that's not big”.
Obviously this is “not a proper definition”, in some sense. If [...]

---

First published:
May 28th, 2024

Source:
https://www.lesswrong.com/posts/dTtLmWFZprFJHsaaQ/when-are-circular-definitions-a-problem
---

Narrated by TYPE III AUDIO.
...more
6min
May 28, 2024 “Being against involuntary death and being open to change are compatible” by Andy_McKenzie
In a new post, Nostalgebraist argues that "AI doomerism has its roots in anti-deathist transhumanism", representing a break from the normal human expectation of mortality and generational change.
They argue that traditionally, each generation has accepted that they will die but that the human race as a whole will continue evolving in ways they cannot fully imagine or control.
Nostalgebraist argues that the "anti-deathist" view, however, anticipates a future where "we are all gonna die" is no longer true -- a future where the current generation doesn't have to die or cede control of the future to their descendants.
Nostalgebraist sees this desire to "strangle posterity" and "freeze time in place" by making one's own generation immortal as contrary to human values, which have always involved an ongoing process of change and progress from generation to generation.
This argument reminds me of Elon Musk's common refrain [...]

---

First published:
May 27th, 2024

Source:
https://www.lesswrong.com/posts/tNi5CECBMAGa2N6sp/being-against-involuntary-death-and-being-open-to-change-are
---

Narrated by TYPE III AUDIO.
...more
6min
May 28, 2024 “Hardshipification” by Jonathan Moregård
This is a link post.
When I got cancer, all of my acquaintances turned into automatons. Everyone I had zero-to-low degrees of social contact with started reaching out, saying the exact same thing: “If you need to talk to someone, I’m here for you”. No matter how tenuous the connection, people pledged their emotional support — including my father's wife's mother, who I met a few hours every other Christmas.
It was only a bit of testicle cancer — what's the big deal? No Swedish person had died from it for 20 years, and the risk of metastasis was below 1%. I settled in for a few months of suck — surgical ball removal and chemotherapy.
My friends, who knew me well, opted to support me with dark humour. When I told my satanist roommate that I had a ball tumour, he offered to “pop” it for me — it [...]

---
Outline:
(01:16) A Difference in Value Judgements
(02:35) Hardshipification
(04:07) A Better Response
---

First published:
May 28th, 2024

Source:
https://www.lesswrong.com/posts/zSQXHgEqRKiZTXdKN/hardshipification
---

Narrated by TYPE III AUDIO.
...more
5min
May 28, 2024 “Reward hacking behavior can generalize across tasks” by Kei Nishimura-Gasparian, Isaac Dunn, Henry Sleight, miles, evhub, Carson Denison, Ethan Perez
Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.
TL;DR: We find that reward hacking generalization occurs in LLMs in a number of experimental settings and can emerge from reward optimization on certain datasets. This suggests that when models exploit flaws in supervision during training, they can sometimes generalize to exploit flaws in supervision in out-of-distribution environments.
Abstract.

Machine learning models can display reward hacking behavior, where models score highly on imperfect reward signals by acting in ways not intended by their designers. Researchers have hypothesized that sufficiently capable models trained to get high reward on a diverse set of environments could become general reward hackers. General reward hackers would use their understanding of human and automated oversight in order to get high reward in a variety of novel environments, even when this requires exploiting gaps in our evaluations and acting in ways we don’t [...]

---
Outline:
(08:40) How do we define “reward hacking”?
(09:49) Experimental Setup
(09:53) Settings
(12:41) Hidden scratchpad
(14:01) Datasets
(15:50) Experimental Results
(15:54) Organic generalization through expert iteration
(20:37) Reward hacking generalization across datasets using synthetic data
(28:37) Generalization from sycophancy to other reward hacks
(32:45) Limitations
(34:20) Suggested Future Work
(37:34) Author Contributions
(38:16) Acknowledgements
(38:40) Appendix
(38:43) Dataset example prompts
(47:27) Dataset sources
(48:25) Scratchpad training details
(51:17) Reward hacking completions from expert iteration experiment
(01:03:35) Generating synthetic hacking and HHH completions
The original text contained 1 footnote which was omitted from this narration.
---

First published:
May 28th, 2024

Source:
https://www.lesswrong.com/posts/Ge55vxEmKXunFFwoe/reward-hacking-behavior-can-generalize-across-tasks
---

Narrated by TYPE III AUDIO.
...more
1h 7min
May 28, 2024 “Understanding Gödel’s completeness theorem” by jessicata
This article contains more than fifty uses of logical or mathematical notation, so an audio narration would be too hard to follow. You'll find a link to the original text in the episode description.

---

First published:
May 27th, 2024

Source:
https://www.lesswrong.com/posts/dPpA79MjPdDd87YoW/understanding-goedel-s-completeness-theorem
---

Narrated by TYPE III AUDIO.
...more
1min
May 28, 2024 “OpenAI: Fallout” by Zvi
Previously: OpenAI: Exodus (contains links at top to earlier episodes), Do Not Mess With Scarlett Johansson
We have learned more since last week. It's worse than we knew.
How much worse? In which ways? With what exceptions?
That's what this post is about.
The Story So Far
For years, employees who left OpenAI consistently had their vested equity explicitly threatened with confiscation and the lack of ability to sell it, and were given short timelines to sign documents or else. Those documents contained highly aggressive NDA and non disparagement (and non interference) clauses, including the NDA preventing anyone from revealing these clauses.
No one knew about this until recently, because until Daniel Kokotajlo everyone signed, and then they could not talk about it. Then Daniel refused to sign, Kelsey Piper started reporting, and a lot came out.
Here is Altman's statement from [...]

---
Outline:
(00:27) The Story So Far
(02:26) A Note on Documents from OpenAI
(02:52) Some Good News But There is a Catch
(12:17) How Blatant Was This Threat?
(14:23) It Sure Looks Like Executives Knew What Was Going On
(18:07) Pressure Tactics Continued Through the End of April 2024
(23:42) The Right to an Attorney
(28:41) The Tender Offer Ace in the Hole
(31:54) The Old Board Speaks
(34:45) OpenAI Did Not Honor Its Public Commitments to Superalignment
(38:59) OpenAI Messed With Scarlett Johansson
(41:55) Another OpenAI Employee Leaves
(44:10) OpenAI Tells Logically Inconsistent Stories
(52:21) When You Put it Like That
(52:51) People Have Thoughts
(56:30) There is a Better Way
(57:18) Should You Consider Working For OpenAI?
(01:02:29) The Situation is Ongoing
---

First published:
May 28th, 2024

Source:
https://www.lesswrong.com/posts/YwhgHwjaBDmjgswqZ/openai-fallout
---

Narrated by TYPE III AUDIO.
...more
1h 7min
May 27, 2024 “Intransitive Trust” by Screwtape
1.
"Transitivity" is a property in mathematics and logic. Put simply, if something is transitive it means that there's a relationship between things where when x relates to y, and y relates to z, there's the same relationship between x and z. For a more concrete example, think of size. If my car is bigger than my couch, and my couch is bigger than my hat, you know that my car is bigger than my hat.
(I am not a math major, and if there's a consensus in the comments that I'm using the wrong term here I can update the post.)
This is a neat property. Lots of things do not have it.
2.
Consider the following circumstance: Bob is traveling home one night, late enough there isn't anyone else around. Bob sees a shooting star growing unusually bright, until it resolves into a disc-shaped machine with [...]

---
Outline:
(00:03) 1.
(00:43) 2.
(06:57) 3.
(11:28) 4.
(13:43) 5.
The original text contained 2 footnotes which were omitted from this narration.
---

First published:
May 27th, 2024

Source:
https://www.lesswrong.com/posts/zKEdphnEycdCJeq8f/intransitive-trust
---

Narrated by TYPE III AUDIO.
...more
18min
May 27, 2024 “Book review: Everything Is Predictable” by PeterMcCluskey
This is a link post.
Book review: Everything Is Predictable: How Bayesian Statistics Explain
Our World, by Tom Chivers.
Many have attempted to persuade the world to embrace a Bayesian
worldview, but none have succeeded in reaching a broad audience.
E.T. Jaynes'
book
has been a leading example, but its appeal is limited to those who find
calculus enjoyable, making it unsuitable for a wider readership.
Other attempts to engage a broader audience often focus on a narrower
understanding, such as Bayes'
Theorem, rather than the
complete worldview.
Claude's most fitting recommendation was Rationality: From AI to
Zombies, but at 1,813 pages, it's
too long and unstructured for me to comfortably recommend to most
readers. (GPT-4o's suggestions were less helpful, focusing only on
resources for practical problem-solving).
Aubrey Clayton's book, Bernoulli's Fallacy: Statistical Illogic and
the Crisis of Modern
Science,
only came to my attention [...]

---
Outline:
(01:25) Basics
(02:24) The Replication Crisis
(03:47) Minds Approximate Bayes
(04:05) Concluding Thoughts
---

First published:
May 27th, 2024

Source:
https://www.lesswrong.com/posts/DcEThyBPZfJvC5tpp/book-review-everything-is-predictable-1
---

Narrated by TYPE III AUDIO.
...more
5min

FAQs about LessWrong (30+ Karma):

How many episodes does LessWrong (30+ Karma) have?

The podcast currently has 2,001 episodes available.

More shows like LessWrong (30+ Karma)

Making Sense with Sam Harris by Sam Harris

Making Sense with Sam Harris

26,446 Listeners

Conversations with Tyler by Mercatus Center at George Mason University

Conversations with Tyler

2,388 Listeners

The Peter Attia Drive by Peter Attia, MD

The Peter Attia Drive

7,910 Listeners

Sean Carroll's Mindscape: Science, Society, Philosophy, Culture, Arts, and Ideas by Sean Carroll | Wondery

Sean Carroll's Mindscape: Science, Society, Philosophy, Culture, Arts, and Ideas

4,133 Listeners

ManifoldOne by Steve Hsu

ManifoldOne

87 Listeners

Your Undivided Attention by Tristan Harris and Aza Raskin, The Center for Humane Technology

Your Undivided Attention

1,462 Listeners

All-In with Chamath, Jason, Sacks & Friedberg by All-In Podcast, LLC

All-In with Chamath, Jason, Sacks & Friedberg

9,095 Listeners

Machine Learning Street Talk (MLST) by Machine Learning Street Talk (MLST)

Machine Learning Street Talk (MLST)

87 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

389 Listeners

Hard Fork by The New York Times

Hard Fork

5,429 Listeners

The Ezra Klein Show by New York Times Opinion

The Ezra Klein Show

15,174 Listeners

Moonshots with Peter Diamandis by PHD Ventures

Moonshots with Peter Diamandis

474 Listeners

No Priors: Artificial Intelligence | Technology | Startups by Conviction

No Priors: Artificial Intelligence | Technology | Startups

121 Listeners

Latent Space: The AI Engineer Podcast by swyx + Alessio

Latent Space: The AI Engineer Podcast

75 Listeners

BG2Pod with Brad Gerstner and Bill Gurley by BG2Pod

BG2Pod with Brad Gerstner and Bill Gurley

459 Listeners