
Sign up to save your podcasts
Or


Training robot foundation models faces two key hurdles: how to get enough data to train an effective model, and how to make sure that new skills can be acquired quickly. The team at Rhoda AI believes that the answer is training Direct Video Action models from web data.
Web data is plentiful, to the point where Rhoda can train their base model on hundreds of years of video data. And then, with the addition of robot data, they can quickly adapt it to new tasks with as little as 20 hours of in-domain data, performing complex, multi-step manipulation tasks with their purpose-built video foundation model. Tongzhou Mu, Eric Chan, and Changan Chen joined us to talk more about their approach.
Watch Episode #79 of RoboPapers, with Michael Cho, Chris Paxton, and Jiafei Duan, to learn more!
Learn More
Blog post: https://www.rhoda.ai/research/direct-video-action
By Chris Paxton and Michael ChoTraining robot foundation models faces two key hurdles: how to get enough data to train an effective model, and how to make sure that new skills can be acquired quickly. The team at Rhoda AI believes that the answer is training Direct Video Action models from web data.
Web data is plentiful, to the point where Rhoda can train their base model on hundreds of years of video data. And then, with the addition of robot data, they can quickly adapt it to new tasks with as little as 20 hours of in-domain data, performing complex, multi-step manipulation tasks with their purpose-built video foundation model. Tongzhou Mu, Eric Chan, and Changan Chen joined us to talk more about their approach.
Watch Episode #79 of RoboPapers, with Michael Cho, Chris Paxton, and Jiafei Duan, to learn more!
Learn More
Blog post: https://www.rhoda.ai/research/direct-video-action