
Sign up to save your podcasts
Or


Ask ChatGPT how to build a bomb, and it will flatly respond that it “can’t help with that.” But users have long played a cat-and-mouse game to try to trick language models into providing forbidden information. Just as quickly as these “jailbreaks” appear, AI companies patch them by simply filtering out forbidden prompts before they ever reach the model itself.
Recently, cryptographers have shown how the defensive filters put around powerful language models can be subverted by well-studied cryptographic tools. In fact, they’ve shown how the very nature of this two-tier system — a filter that protects a powerful language model inside it — creates gaps in the defenses that can always be exploited. In this episode, Quanta executive editor Michael Moyer tells Samir Patel about the findings and implications of this new work.
Audio coda courtesy of Banana Breakdown.
By Quanta Magazine4.7
515515 ratings
Ask ChatGPT how to build a bomb, and it will flatly respond that it “can’t help with that.” But users have long played a cat-and-mouse game to try to trick language models into providing forbidden information. Just as quickly as these “jailbreaks” appear, AI companies patch them by simply filtering out forbidden prompts before they ever reach the model itself.
Recently, cryptographers have shown how the defensive filters put around powerful language models can be subverted by well-studied cryptographic tools. In fact, they’ve shown how the very nature of this two-tier system — a filter that protects a powerful language model inside it — creates gaps in the defenses that can always be exploited. In this episode, Quanta executive editor Michael Moyer tells Samir Patel about the findings and implications of this new work.
Audio coda courtesy of Banana Breakdown.

1,357 Listeners

320 Listeners

844 Listeners

563 Listeners

238 Listeners

827 Listeners

1,070 Listeners

941 Listeners

4,198 Listeners

2,361 Listeners

507 Listeners

252 Listeners

324 Listeners

21 Listeners

388 Listeners

488 Listeners