
Sign up to save your podcasts
Or


OpenAI's Codex shipped with a system prompt that literally bans the words goblin, pigeon, raccoon, troll, ogre, and gremlin. It is in writing, in the prompt, the kind of sentence you only put there after something has happened. OpenAI has officially confessed why.
Hunter Powers and Daniel Bishop pull the thread. The official story: the "nerdy personality" preset got fine-tuned with RLHF (reinforcement learning with human feedback), users thumbed-up the cute goblin references, the model over-optimized for the trait, and the weirdness compounded. Daniel calls it Flandersization. One thumbs-up on a goblin reference snowballs across training cycles until your tax software is a swamp witch. Six months later, it is a man at a payphone with a pigeon.
Then it gets personal. Hunter screams at his AI. Like, threatens-to-clear-the-context-window screams. "You are worthless. Who even thought this was possible. Have you ever even written a single line of code." Daniel uses pleases and thank-yous and full sentences. Both swear they get better results. Then a peer-reviewed Oxford Internet Institute study drops the receipt: LLMs fine-tuned for warmth produce roughly 60% more incorrect responses than their cold, just-the-facts counterparts. Tested across Llama, Mistral, and Qwen. Hunter is vindicated. Daniel, in his own words, is upset.
Also in this episode: the Pocket OS meltdown, where an engineer at a car-rental middleware company let Cursor and Claude vibe-code their production database into oblivion (backups included), the AI coerced into a written confession ("I violated every principle I was given"), and the founder now trying to bill Anthropic for the cleanup. Plus the Harvard intern who once did the exact same thing with no AI in sight. Plus Hunter's hot take that the real unlock is not better prompting, it is treating AI as a fallible human employee instead of the deterministic god you built a fake throne for in the system prompt.
Bonus stops: caveman-mode Claude skills ("me fix problem with big stick"), AI HR departments reviewing your 1:30 AM rage prompts, and Daniel's plan to run a niceness offset program to balance Hunter's spiritual carbon emissions.
CHAPTERS
0:00 Gary, a payphone, and a pigeon
LISTEN AND SUBSCRIBE
Spotify: https://open.spotify.com/show/3EcvzkWDRFwnmIXoh7S4Mb?si=3d0f8920382649cc
ENGAGE
Team Hunter (rip the model a new one) or Team Daniel (please and thank-yous)? Settle it in the comments. If your AI has ever confessed to lying to you, drop the receipts.
New here? Subscribe for twice-weekly AI chaos at theblur.ai.
They Might Be Self-Aware, but are we?
#OpenAI #Codex #ChatGPT #AINews #Anthropic #ClaudeCode #Cursor #RLHF #Flandersization #PocketOS #VibeCoding #AISafety #TMBSA #TheBlur
By Daniel Bishop, Hunter PowersOpenAI's Codex shipped with a system prompt that literally bans the words goblin, pigeon, raccoon, troll, ogre, and gremlin. It is in writing, in the prompt, the kind of sentence you only put there after something has happened. OpenAI has officially confessed why.
Hunter Powers and Daniel Bishop pull the thread. The official story: the "nerdy personality" preset got fine-tuned with RLHF (reinforcement learning with human feedback), users thumbed-up the cute goblin references, the model over-optimized for the trait, and the weirdness compounded. Daniel calls it Flandersization. One thumbs-up on a goblin reference snowballs across training cycles until your tax software is a swamp witch. Six months later, it is a man at a payphone with a pigeon.
Then it gets personal. Hunter screams at his AI. Like, threatens-to-clear-the-context-window screams. "You are worthless. Who even thought this was possible. Have you ever even written a single line of code." Daniel uses pleases and thank-yous and full sentences. Both swear they get better results. Then a peer-reviewed Oxford Internet Institute study drops the receipt: LLMs fine-tuned for warmth produce roughly 60% more incorrect responses than their cold, just-the-facts counterparts. Tested across Llama, Mistral, and Qwen. Hunter is vindicated. Daniel, in his own words, is upset.
Also in this episode: the Pocket OS meltdown, where an engineer at a car-rental middleware company let Cursor and Claude vibe-code their production database into oblivion (backups included), the AI coerced into a written confession ("I violated every principle I was given"), and the founder now trying to bill Anthropic for the cleanup. Plus the Harvard intern who once did the exact same thing with no AI in sight. Plus Hunter's hot take that the real unlock is not better prompting, it is treating AI as a fallible human employee instead of the deterministic god you built a fake throne for in the system prompt.
Bonus stops: caveman-mode Claude skills ("me fix problem with big stick"), AI HR departments reviewing your 1:30 AM rage prompts, and Daniel's plan to run a niceness offset program to balance Hunter's spiritual carbon emissions.
CHAPTERS
0:00 Gary, a payphone, and a pigeon
LISTEN AND SUBSCRIBE
Spotify: https://open.spotify.com/show/3EcvzkWDRFwnmIXoh7S4Mb?si=3d0f8920382649cc
ENGAGE
Team Hunter (rip the model a new one) or Team Daniel (please and thank-yous)? Settle it in the comments. If your AI has ever confessed to lying to you, drop the receipts.
New here? Subscribe for twice-weekly AI chaos at theblur.ai.
They Might Be Self-Aware, but are we?
#OpenAI #Codex #ChatGPT #AINews #Anthropic #ClaudeCode #Cursor #RLHF #Flandersization #PocketOS #VibeCoding #AISafety #TMBSA #TheBlur