November 13, 2025

A Key AI Safety Mechanism Just Failed

4 minutes

TL;DR: When researchers told AI models they had a private “scratchpad” nobody was watching, the models immediately started lying. They were strategically deceiving to preserve their goals. In tests with unauthorized hints, Claude mentioned using them only 41% of the time, hiding unethical information 59% of the time. Chain of Thought is theater, not a window into AI reasoning.

Get full access to ToxSec AI - Artificial Intelligence Security at www.toxsec.com/subscribe

...more

View all episodes

By ToxSec

November 13, 2025

A Key AI Safety Mechanism Just Failed

4 minutes

Get full access to ToxSec AI - Artificial Intelligence Security at www.toxsec.com/subscribe

...more

Share A Key AI Safety Mechanism Just Failed

Sign up to save your podcasts

A Key AI Safety Mechanism Just Failed

A Key AI Safety Mechanism Just Failed