Share Ep#79: Rhoda AI - Causal Video Models Are Data-Efficient Robot Policy Learners

Copy link

May 06, 2026

Ep#79: Rhoda AI - Causal Video Models Are Data-Efficient Robot Policy Learners

1 hour 9 minutes

Training robot foundation models faces two key hurdles: how to get enough data to train an effective model, and how to make sure that new skills can be acquired quickly. The team at Rhoda AI believes that the answer is training Direct Video Action models from web data.

Web data is plentiful, to the point where Rhoda can train their base model on hundreds of years of video data. And then, with the addition of robot data, they can quickly adapt it to new tasks with as little as 20 hours of in-domain data, performing complex, multi-step manipulation tasks with their purpose-built video foundation model. Tongzhou Mu, Eric Chan, and Changan Chen joined us to talk more about their approach.

Watch Episode #79 of RoboPapers, with Michael Cho, Chris Paxton, and Jiafei Duan, to learn more!

Learn More

Blog post: https://www.rhoda.ai/research/direct-video-action

This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit robopapers.substack.com

...more

View all episodes

By Chris Paxton and Michael Cho