Why do even the smartest AI systems struggle with tasks that feel simple to humans, like completing actions in the right order?
In this episode, we explore new research in Vision-Language: Action (VLA) models that reveals a subtle but powerful insight: the problem isn’t intelligence or scale, but sequence.
Modern AI often learns from long, continuous demonstrations, where meaningful actions blur into one another. As a result, these systems fail when tasks require new combinations of familiar steps.
Drawing from recent robotics research on Atomic Action Slicing, this conversation looks at how breaking behavior into small, planner-aligned actions helps machines reason, learn, and generalize more like humans. We connect this idea to real-world examples, from robotics to autonomous driving and reflect on what it teaches us about human cognition itself.
Maybe the future of AI isn’t about making machines think harder, but about helping them understand moments, steps, and intent.