Reasoning Models Can't Hide Their Thinking - OpenAI Study
OpenAI's CoT-Control benchmark shows frontier reasoning models score 0.1-15.4% at steering their own chain of thought - a result the company frames as good news for AI oversight.
Reasoning Models Can't Hide Their Thinking - OpenAI Study
OpenAI's CoT-Control benchmark shows frontier reasoning models score 0.1-15.4% at steering their own chain of thought - a result the company frames as good news for AI oversight.