March 23, 2026

VEGA-3D: Teaching multimodal LLMs spatial reasoning through video generation

32 minutes

A plug-and-play framework extracts implicit 3D priors from video diffusion models to enhance multimodal LLMs with spatial reasoning capabilities, enabling improved geometric scene understanding and embodied decision-making without explicit 3D supervision.

...more

View all episodes

By Shaoqing Tan

March 23, 2026

VEGA-3D: Teaching multimodal LLMs spatial reasoning through video generation

32 minutes

...more

Share VEGA-3D: Teaching multimodal LLMs spatial reasoning through video generation

Sign up to save your podcasts

VEGA-3D: Teaching multimodal LLMs spatial reasoning through video generation

VEGA-3D: Teaching multimodal LLMs spatial reasoning through video generation