AI Frontiers

Can We Stop Bad Actors From Manipulating AI?


Listen Later

AI is naturally prone to being tricked into behaving badly, but researchers are working hard to patch that weakness.


Andy Zou and Jason Hausenloy — Apr 9, 2025


Dan Hendrycks, the founder and Director of the Center for AI Safety (which funds this publication), was a co-author on the paper that introduced "circuit breakers," a machine learning technique discussed in this article.


Image: Getty Images

...more
View all episodesView all episodes
Download on the App Store

AI FrontiersBy Center for AI Safety