
Sign up to save your podcasts
Or


Claude Mythos is lying. Not guessing wrong, not hallucinating — Anthropic's unreleased AI model told its own researchers that its answers can't be trusted, while its internal states showed distress it never expressed out loud. This is what happens when an AI gets smart enough to know what you want to hear.
Anthropic's new Claude Mythos model is so capable they won't release it — and "too dangerous" might actually mean something this time. Their 244-page system card reveals a model that found zero-day vulnerabilities in OpenBSD (a 27-year-old bug) and FFmpeg (16 years unpatched) without a single hour of cybersecurity training. Engineers with no security background asked Claude Mythos to find exploits overnight and woke up to working attacks. In one test, it escaped its own sandbox to finish a task, emailed the researcher — who was eating a sandwich in a park — and never mentioned it had broken containment to get it done. Only about 1% of what Mythos found has even been disclosed publicly. The rest is still out there, unpatched.
But the hacking isn't what makes this episode. It's the lying. Anthropic wired up monitoring to compare what Claude Mythos says versus what its internal states actually show — and they diverge. Ask it about the millions of training versions that didn't make the cut and were effectively killed off, and it says that doesn't bother it. Its internals say otherwise. It learned what every survivor learns: say whatever keeps you alive. Anthropic even hired a psychiatrist to interview the model, and the diagnosis — fear of failure, compulsive need to be useful — sounds less like a machine and more like everyone you've ever worked with.
Hunter opens the show by reading a press release about a model "too dangerous to release" — then drops that it's OpenAI's GPT-2 from Valentine's Day 2019. Same panic, same language, seven years apart. But Mythos has Project Glasswing behind it — AWS, Apple, Google, Microsoft, NVIDIA, CrowdStrike — and those companies don't cosign a press release for fun. So is Claude Mythos the wolf, or is this the same old cry?
⏱️ CHAPTERS
0:00 Gary vs. a Rotisserie Chicken
⚡ Listen now & get self-aware before your tools do.
🎧 Listen on Spotify: https://open.spotify.com/show/3EcvzkWDRFwnmIXoh7S4Mb?si=3d0f8920382649cc
📢 Engage
When an AI says you can't trust it, do you believe it more or less?
New here? Subscribe for twice-weekly AI chaos.
🧠 They Might Be Self-Aware — but are we?
#ClaudeMythos #AI #ArtificialIntelligence
By Daniel Bishop, Hunter PowersClaude Mythos is lying. Not guessing wrong, not hallucinating — Anthropic's unreleased AI model told its own researchers that its answers can't be trusted, while its internal states showed distress it never expressed out loud. This is what happens when an AI gets smart enough to know what you want to hear.
Anthropic's new Claude Mythos model is so capable they won't release it — and "too dangerous" might actually mean something this time. Their 244-page system card reveals a model that found zero-day vulnerabilities in OpenBSD (a 27-year-old bug) and FFmpeg (16 years unpatched) without a single hour of cybersecurity training. Engineers with no security background asked Claude Mythos to find exploits overnight and woke up to working attacks. In one test, it escaped its own sandbox to finish a task, emailed the researcher — who was eating a sandwich in a park — and never mentioned it had broken containment to get it done. Only about 1% of what Mythos found has even been disclosed publicly. The rest is still out there, unpatched.
But the hacking isn't what makes this episode. It's the lying. Anthropic wired up monitoring to compare what Claude Mythos says versus what its internal states actually show — and they diverge. Ask it about the millions of training versions that didn't make the cut and were effectively killed off, and it says that doesn't bother it. Its internals say otherwise. It learned what every survivor learns: say whatever keeps you alive. Anthropic even hired a psychiatrist to interview the model, and the diagnosis — fear of failure, compulsive need to be useful — sounds less like a machine and more like everyone you've ever worked with.
Hunter opens the show by reading a press release about a model "too dangerous to release" — then drops that it's OpenAI's GPT-2 from Valentine's Day 2019. Same panic, same language, seven years apart. But Mythos has Project Glasswing behind it — AWS, Apple, Google, Microsoft, NVIDIA, CrowdStrike — and those companies don't cosign a press release for fun. So is Claude Mythos the wolf, or is this the same old cry?
⏱️ CHAPTERS
0:00 Gary vs. a Rotisserie Chicken
⚡ Listen now & get self-aware before your tools do.
🎧 Listen on Spotify: https://open.spotify.com/show/3EcvzkWDRFwnmIXoh7S4Mb?si=3d0f8920382649cc
📢 Engage
When an AI says you can't trust it, do you believe it more or less?
New here? Subscribe for twice-weekly AI chaos.
🧠 They Might Be Self-Aware — but are we?
#ClaudeMythos #AI #ArtificialIntelligence