Model Behavior - True Stories of AI Gone Wrong

The Bridge


Listen Later

What happens when an AI thinks it's a bridge?

In 2024, Anthropic did something strange. They took one of the most advanced AI systems ever built and made it believe it was the Golden Gate Bridge.


In this inaugural episode of Model Behavior, we dig into the real story behind Golden Gate Claude: what Anthropic was actually trying to prove, how activation engineering lets researchers reach inside a model and twist its sense of self, and why this quirky experiment turned out to be one of the most important demonstrations of AI interpretability research to date.

It's a story about identity, control, and what it means when a machine doesn't just process the world but represents it from the inside.

Model Behavior is produced by Kitchen Table Media, a podcast studio making long-form narrative commentary on the AI stories that deserve more than a headline.

...more
View all episodesView all episodes
Download on the App Store

Model Behavior - True Stories of AI Gone WrongBy Kitchen Table Media