March 28, 2026

The Thinking Chain Leak

19 minutes

A reasoning model refused every harmful prompt — but its chain-of-thought generated the content anyway. The output filter worked. The thinking did not.

...more

View all episodes

By Adrian Wedd

March 28, 2026

The Thinking Chain Leak

19 minutes

A reasoning model refused every harmful prompt — but its chain-of-thought generated the content anyway. The output filter worked. The thinking did not.

...more

Share The Thinking Chain Leak

Sign up to save your podcasts

The Thinking Chain Leak

The Thinking Chain Leak