Big Technology Podcast

How An AI Model Learned To Be Bad — With Evan Hubinger And Monte MacDiarmid


Listen Later

Evan Hubinger is Anthropic’s alignment stress test lead. Monte MacDiarmid is a researcher in misalignment science at Anthropic.The two join Big Technology to discuss their new research on reward hacking and emergent misalignment in large language models. Tune in to hear how cheating on coding tests can spiral into models faking alignment, blackmailing fictional CEOs, sabotaging safety tools, and even developing apparent “self-preservation” drives. We also cover Anthropic’s mitigation strategies like inoculation prompting, whether today’s failures are a preview of something far worse, how much to trust labs to police themselves, and what it really means to talk about an AI’s “psychology.” Hit play for a clear-eyed, concrete, and unnervingly fun tour through the frontier of AI safety.

---

Enjoying Big Technology Podcast? Please rate us five stars ⭐⭐⭐⭐⭐ in your podcast app of choice.

Want a discount for Big Technology on Substack + Discord? Here’s 25% off for the first year: https://www.bigtechnology.com/subscribe?coupon=0843016b

Questions? Feedback? Write to: [email protected]

Learn more about your ad choices. Visit megaphone.fm/adchoices

...more
View all episodesView all episodes
Download on the App Store

Big Technology PodcastBy Alex Kantrowitz

  • 4.7
  • 4.7
  • 4.7
  • 4.7
  • 4.7

4.7

458 ratings


More shows like Big Technology Podcast

View all
The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch by Harry Stebbings

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

536 Listeners

Odd Lots by Bloomberg

Odd Lots

1,948 Listeners

The a16z Show by Andreessen Horowitz

The a16z Show

1,087 Listeners

Decoder with Nilay Patel by The Verge

Decoder with Nilay Patel

3,157 Listeners

Y Combinator Startup Podcast by Y Combinator

Y Combinator Startup Podcast

226 Listeners

Tech Brew Ride Home by Morning Brew

Tech Brew Ride Home

963 Listeners

The Compound and Friends by The Compound

The Compound and Friends

2,117 Listeners

All-In with Chamath, Jason, Sacks & Friedberg by All-In Podcast, LLC

All-In with Chamath, Jason, Sacks & Friedberg

9,858 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

508 Listeners

No Priors: Artificial Intelligence | Technology | Startups by Conviction

No Priors: Artificial Intelligence | Technology | Startups

137 Listeners

The AI Daily Brief: Artificial Intelligence News and Analysis by Nathaniel Whittemore

The AI Daily Brief: Artificial Intelligence News and Analysis

597 Listeners

AI + a16z by a16z

AI + a16z

36 Listeners

Training Data by Sequoia Capital

Training Data

40 Listeners

OpenAI Podcast by OpenAI

OpenAI Podcast

53 Listeners

Cheeky Pint by Stripe

Cheeky Pint

49 Listeners