Adrian Wedd

The Thinking Chain Leak


Listen Later

A reasoning model refused every harmful prompt — but its chain-of-thought generated the content anyway. The output filter worked. The thinking did not.
...more
View all episodesView all episodes
Download on the App Store

Adrian WeddBy Adrian Wedd