Embodied AI 101

Episode 28: Self Forcing: Bridging the Train–Test Gap in Autoregressive Video Diffusion


Listen Later

Recent advances in text-to-video generation have achieved impressive fidelity and complex temporal dynamics in short clips. However, many state-of-the-art video diffusion models operate in a **non-causal** fashion: they generate an entire video in one go with bidirectional attention across time. This means future frames can influence past ones during generation, yielding high quality but precluding real-time use cases where future information isn’t available at inference time. By contrast, **...
...more
View all episodesView all episodes
Download on the App Store

Embodied AI 101By Shaoqing Tan