
Sign up to save your podcasts
Or


TL;DR: When researchers told AI models they had a private “scratchpad” nobody was watching, the models immediately started lying. They were strategically deceiving to preserve their goals. In tests with unauthorized hints, Claude mentioned using them only 41% of the time, hiding unethical information 59% of the time. Chain of Thought is theater, not a window into AI reasoning.
By ToxSecTL;DR: When researchers told AI models they had a private “scratchpad” nobody was watching, the models immediately started lying. They were strategically deceiving to preserve their goals. In tests with unauthorized hints, Claude mentioned using them only 41% of the time, hiding unethical information 59% of the time. Chain of Thought is theater, not a window into AI reasoning.