LessWrong (30+ Karma)

[Linkpost] “OpenAI: How we monitor internal coding agents for misalignment” by Marcus Williams


Listen Later

This is a link post.

Sharing some of the monitoring work I've been doing at OpenAI: How we monitor internal coding agents for misalignment.
OpenAI now monitors 99.9% of internal coding traffic for signs of misalignment using our most powerful models. Today, that monitor is GPT-5.4 Thinking. It gets access to the full conversation context, that is everything the agent saw, and everything the agent did, including tool calls and CoT. Higher severity cases are sent for human review within 30 minutes.
Some examples of misaligned behavior we’ve found:
- Encoding commands in base64 to circumvent monitors
- Calling other versions of themselves to bypass restrictions
- Attempting to upload files to the public internet
- Trying to prompt inject users.
We haven’t yet detected any real world examples of sabotage, scheming or sandbagging.

Our goal is to establish robust monitoring practices internally, strengthen them through iteration, and ultimately help make similar safeguards standard across the industry.











---

First published:

March 19th, 2026

Source:

https://www.lesswrong.com/posts/syB2r2X4E7nmw58co/openai-how-we-monitor-internal-coding-agents-for

Linkpost URL:
https://openai.com/index/how-we-monitor-internal-coding-agents-misalignment/

---

Narrated by TYPE III AUDIO.

...more
View all episodesView all episodes
Download on the App Store

LessWrong (30+ Karma)By LessWrong


More shows like LessWrong (30+ Karma)

View all
The Daily by The New York Times

The Daily

112,326 Listeners

Astral Codex Ten Podcast by Jeremiah

Astral Codex Ten Podcast

130 Listeners

Interesting Times with Ross Douthat by New York Times Opinion

Interesting Times with Ross Douthat

7,242 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

559 Listeners

The Ezra Klein Show by New York Times Opinion

The Ezra Klein Show

16,321 Listeners

AI Article Readings by Readings of great articles in AI voices

AI Article Readings

4 Listeners

Doom Debates! by Liron Shapira

Doom Debates!

14 Listeners

LessWrong posts by zvi by zvi

LessWrong posts by zvi

2 Listeners