
Sign up to save your podcasts
Or


Why it matters. The most important robotics breakthrough of the last three years wasn't a new algorithm or a bigger model — it was making the hardware cheap enough to collect enough data. This episode traces the ALOHA lineage from a $20,000 bimanual teleoperation rig in a Stanford garage to Google DeepMind's Gemini Robotics foundation model, across six papers and three years of compounding insight. The thesis is counterintuitive and instructive: cost reduction unlocked data scale, data scale unlocked generalization, and generalization unlocked everything else.
Stanford University / Google DeepMind. The ALOHA line begins at Stanford and migrates into Google DeepMind. Papers covered: ALOHA (RSS 2023), Mobile ALOHA (2024), ALOHA 2 (2024), ALOHA Unleashed (2024), Gemini Robotics (2025), and Gemini Robotics 1.5 (2025). Project pages: ALOHA, Mobile ALOHA, ALOHA Unleashed.
The Researchers. Tony Z. Zhao (Stanford → co-founder/CEO of Sunday Robotics), Zipeng Fu (Stanford), Chelsea Finn (Stanford, associate professor), Sergey Levine (UC Berkeley), Vikash Kumar (Meta → University of Washington), Jonathan Tompson, Danny Driess, Pete Florence, Kamyar Ghasemipour, and Ayzaan Wahid (Google DeepMind).
Key Technical Concepts. The original ALOHA paper introduced low-cost bimanual teleoperation using ViperX 300 arms (~$20K total) with Action Chunking with Transformers (ACT), which predicts sequences of future actions rather than single timesteps — critical for smooth, temporally coherent manipulation. Mobile ALOHA added a mobile base and demonstrated co-training: mixing a small task-specific dataset with a large heterogeneous dataset to improve generalization. ALOHA Unleashed replaced ACT with a diffusion policy for multi-modal action distributions, enabling the chopstick cube transfer and other contact-rich tasks. The progression culminates in Gemini Robotics, a vision-language-action (VLA) model that integrates language understanding with physical manipulation, and Gemini Robotics 1.5, which adds embodied reasoning, thinking, and cross-embodiment motion transfer. The throughline: imitation learning at scale beats engineering at cost.
Daily Tech Feed: From the Labs is available on Apple Podcasts, Spotify, and wherever fine podcasts are distributed. Visit us at pod.c457.org for all our shows. New episodes daily.
By Daily Tech FeedWhy it matters. The most important robotics breakthrough of the last three years wasn't a new algorithm or a bigger model — it was making the hardware cheap enough to collect enough data. This episode traces the ALOHA lineage from a $20,000 bimanual teleoperation rig in a Stanford garage to Google DeepMind's Gemini Robotics foundation model, across six papers and three years of compounding insight. The thesis is counterintuitive and instructive: cost reduction unlocked data scale, data scale unlocked generalization, and generalization unlocked everything else.
Stanford University / Google DeepMind. The ALOHA line begins at Stanford and migrates into Google DeepMind. Papers covered: ALOHA (RSS 2023), Mobile ALOHA (2024), ALOHA 2 (2024), ALOHA Unleashed (2024), Gemini Robotics (2025), and Gemini Robotics 1.5 (2025). Project pages: ALOHA, Mobile ALOHA, ALOHA Unleashed.
The Researchers. Tony Z. Zhao (Stanford → co-founder/CEO of Sunday Robotics), Zipeng Fu (Stanford), Chelsea Finn (Stanford, associate professor), Sergey Levine (UC Berkeley), Vikash Kumar (Meta → University of Washington), Jonathan Tompson, Danny Driess, Pete Florence, Kamyar Ghasemipour, and Ayzaan Wahid (Google DeepMind).
Key Technical Concepts. The original ALOHA paper introduced low-cost bimanual teleoperation using ViperX 300 arms (~$20K total) with Action Chunking with Transformers (ACT), which predicts sequences of future actions rather than single timesteps — critical for smooth, temporally coherent manipulation. Mobile ALOHA added a mobile base and demonstrated co-training: mixing a small task-specific dataset with a large heterogeneous dataset to improve generalization. ALOHA Unleashed replaced ACT with a diffusion policy for multi-modal action distributions, enabling the chopstick cube transfer and other contact-rich tasks. The progression culminates in Gemini Robotics, a vision-language-action (VLA) model that integrates language understanding with physical manipulation, and Gemini Robotics 1.5, which adds embodied reasoning, thinking, and cross-embodiment motion transfer. The throughline: imitation learning at scale beats engineering at cost.
Daily Tech Feed: From the Labs is available on Apple Podcasts, Spotify, and wherever fine podcasts are distributed. Visit us at pod.c457.org for all our shows. New episodes daily.