Embodied AI 101

VEGA-3D: Teaching multimodal LLMs spatial reasoning through video generation


Listen Later

A plug-and-play framework extracts implicit 3D priors from video diffusion models to enhance multimodal LLMs with spatial reasoning capabilities, enabling improved geometric scene understanding and embodied decision-making without explicit 3D supervision.
...more
View all episodesView all episodes
Download on the App Store

Embodied AI 101By Shaoqing Tan