AI Deep Dive

55: The Butterbench Problem


Listen Later

Large language models can write sonnets and debug code, but put that same "brain" into a robot and it often flunks kindergarten-level spatial tasks. In this episode we unpack the embodiment gap — the surprising results of the Andon Labs butterbench (Gemini 2.5 Pro ~40% task completion, Cloudopus 4.1 ~37%), the Waymo cat incident, and why LLMs trained on text routinely ignore real-time sensor feedback and basic physics. Then we flip the script: where robots are winning today is in extreme specialization — swallowable spider-inspired capsules for cancer screening, bat-like echolocation microdrones for search-and-rescue, and Toyota’s legged WalkMe mobility concept — showing that task-focused design + sensor-native control beats forcing a giant language brain into a body. We also pull back the curtain on the business side: Apple’s Siri pivot to Gemini on private cloud, OpenAI’s blockbuster revenue and internal drama, and the engineering quirks (context compaction, weird sampling bugs, even EM-dash fingerprints) that quietly shape product performance. The takeaway for marketers and AI builders: real-world value is emerging from small, cheap models and clever physical design, not just headline LLMs. We close with the provocation every product leader should answer — teach the body to sense and act first, or keep scaling the brain — and what that choice means for strategy, investment, and go-to-market moves in the next wave of AI.
...more
View all episodesView all episodes
Download on the App Store

AI Deep DiveBy Pete Larkin