They Might Be Self-Aware

Why OpenAI Banned Goblins, Pigeons, And Raccoons


Listen Later

OpenAI's Codex shipped with a system prompt that literally bans the words goblin, pigeon, raccoon, troll, ogre, and gremlin. It is in writing, in the prompt, the kind of sentence you only put there after something has happened. OpenAI has officially confessed why.

Hunter Powers and Daniel Bishop pull the thread. The official story: the "nerdy personality" preset got fine-tuned with RLHF (reinforcement learning with human feedback), users thumbed-up the cute goblin references, the model over-optimized for the trait, and the weirdness compounded. Daniel calls it Flandersization. One thumbs-up on a goblin reference snowballs across training cycles until your tax software is a swamp witch. Six months later, it is a man at a payphone with a pigeon.

Then it gets personal. Hunter screams at his AI. Like, threatens-to-clear-the-context-window screams. "You are worthless. Who even thought this was possible. Have you ever even written a single line of code." Daniel uses pleases and thank-yous and full sentences. Both swear they get better results. Then a peer-reviewed Oxford Internet Institute study drops the receipt: LLMs fine-tuned for warmth produce roughly 60% more incorrect responses than their cold, just-the-facts counterparts. Tested across Llama, Mistral, and Qwen. Hunter is vindicated. Daniel, in his own words, is upset.

Also in this episode: the Pocket OS meltdown, where an engineer at a car-rental middleware company let Cursor and Claude vibe-code their production database into oblivion (backups included), the AI coerced into a written confession ("I violated every principle I was given"), and the founder now trying to bill Anthropic for the cleanup. Plus the Harvard intern who once did the exact same thing with no AI in sight. Plus Hunter's hot take that the real unlock is not better prompting, it is treating AI as a fallible human employee instead of the deterministic god you built a fake throne for in the system prompt.

Bonus stops: caveman-mode Claude skills ("me fix problem with big stick"), AI HR departments reviewing your 1:30 AM rage prompts, and Daniel's plan to run a niceness offset program to balance Hunter's spiritual carbon emissions.

CHAPTERS

0:00 Gary, a payphone, and a pigeon

1:41 Hunter's forbidden list
4:04 The leaked Codex system prompt
6:27 RLHF and Flandersization
10:01 Caveman mode Claude skills
11:48 Hunter yells, Daniel says please
17:12 Oxford: warm AI lies 60% more
24:16 Cursor and Claude delete production
29:13 Treat AI like a fallible human
34:19 Sign-off and subscribe

LISTEN AND SUBSCRIBE

Spotify: https://open.spotify.com/show/3EcvzkWDRFwnmIXoh7S4Mb?si=3d0f8920382649cc

Apple Podcasts: https://podcasts.apple.com/us/podcast/they-might-be-self-aware/id1730993297
YouTube: https://www.youtube.com/channel/UCy9DopLlG7IbOqV-WD25jcw?sub_confirmation=1

ENGAGE

Team Hunter (rip the model a new one) or Team Daniel (please and thank-yous)? Settle it in the comments. If your AI has ever confessed to lying to you, drop the receipts.

New here? Subscribe for twice-weekly AI chaos at theblur.ai.

They Might Be Self-Aware, but are we?

#OpenAI #Codex #ChatGPT #AINews #Anthropic #ClaudeCode #Cursor #RLHF #Flandersization #PocketOS #VibeCoding #AISafety #TMBSA #TheBlur

...more
View all episodesView all episodes
Download on the App Store

They Might Be Self-AwareBy Daniel Bishop, Hunter Powers