January 16, 2026

“Powerful misaligned AIs may be extremely persuasive, especially absent mitigations” by Cody Rushing

23 minutes

A multitude of forecasts discuss how powerful AIs might quickly arise and influence the world within the coming decades. I’ve run a variety of tabletop exercises created by the authors of AI 2027, the most famous such forecast, which aim to help participants understand the dynamics of worlds similar to AI 2027's. At one point in the exercise participants must decide how persuasive the AI models will be, during a time when AIs outperform humans at every remote work task and accelerate AI R&D by 100x. I think most participants underestimate how persuasive these AIs are. By default, I think powerful misaligned AIs will be extremely persuasive, especially absent mitigations.

Imagine such an AI wants to convince you, a busy politician, that your longtime advisor is secretly undermining you. Will you catch the lie out of all the other surprisingly correct advice the AI has given you?

The AI tells you directly about it, of course, in that helpful tone it always uses. But it's not just the AI. Your chief of staff mentioned that the advisor has been surprisingly absent for the last week, spending a surprisingly small amount of time at work and deferring quite a lot [...]

---

Outline:

(06:58) AI is being rapidly adopted, and people are already believing the AIs

(10:12) Humans believe what they are incentivized to, and the incentives will be to believe the AIs.

(13:52) Will you hear the truth?

(18:54) What about mitigations?

(21:01) What if the AI is not worst-case?

(22:36) So what?

The original text contained 3 footnotes which were omitted from this narration.

---

First published:

January 16th, 2026