
Sign up to save your podcasts
Or


Evan Hubinger is Anthropic’s alignment stress test lead. Monte MacDiarmid is a researcher in misalignment science at Anthropic.The two join Big Technology to discuss their new research on reward hacking and emergent misalignment in large language models. Tune in to hear how cheating on coding tests can spiral into models faking alignment, blackmailing fictional CEOs, sabotaging safety tools, and even developing apparent “self-preservation” drives. We also cover Anthropic’s mitigation strategies like inoculation prompting, whether today’s failures are a preview of something far worse, how much to trust labs to police themselves, and what it really means to talk about an AI’s “psychology.” Hit play for a clear-eyed, concrete, and unnervingly fun tour through the frontier of AI safety.
---
Enjoying Big Technology Podcast? Please rate us five stars ⭐⭐⭐⭐⭐ in your podcast app of choice.
Want a discount for Big Technology on Substack + Discord? Here’s 25% off for the first year: https://www.bigtechnology.com/subscribe?coupon=0843016b
Questions? Feedback? Write to: [email protected]
---
Wealthfront.com/bigtech. If eligible for the overall boosted 4.15% rate offered with this promo, your boosted rate is subject to change if the 3.50% base rate decreases during the 3-month promo period.
Learn more about your ad choices. Visit megaphone.fm/adchoices
By Alex Kantrowitz4.7
479479 ratings
Evan Hubinger is Anthropic’s alignment stress test lead. Monte MacDiarmid is a researcher in misalignment science at Anthropic.The two join Big Technology to discuss their new research on reward hacking and emergent misalignment in large language models. Tune in to hear how cheating on coding tests can spiral into models faking alignment, blackmailing fictional CEOs, sabotaging safety tools, and even developing apparent “self-preservation” drives. We also cover Anthropic’s mitigation strategies like inoculation prompting, whether today’s failures are a preview of something far worse, how much to trust labs to police themselves, and what it really means to talk about an AI’s “psychology.” Hit play for a clear-eyed, concrete, and unnervingly fun tour through the frontier of AI safety.
---
Enjoying Big Technology Podcast? Please rate us five stars ⭐⭐⭐⭐⭐ in your podcast app of choice.
Want a discount for Big Technology on Substack + Discord? Here’s 25% off for the first year: https://www.bigtechnology.com/subscribe?coupon=0843016b
Questions? Feedback? Write to: [email protected]
---
Wealthfront.com/bigtech. If eligible for the overall boosted 4.15% rate offered with this promo, your boosted rate is subject to change if the 3.50% base rate decreases during the 3-month promo period.
Learn more about your ad choices. Visit megaphone.fm/adchoices

1,291 Listeners

535 Listeners

1,095 Listeners

3,144 Listeners

345 Listeners

225 Listeners

969 Listeners

202 Listeners

534 Listeners

140 Listeners

99 Listeners

225 Listeners

637 Listeners

466 Listeners

33 Listeners