RoboPapers

Ep#46: ManiFlow: A General Robot Manipulation Policy via Consistency Flow Training


Listen Later

Improving robot's’ ability to learn from human demonstrations is key to getting better performance from them in a wide variety of tasks. Algorithmic improvements like consistency flow training and a new architecture which can leverage multimodal inputs, allows ManiFlow to substantially improve on prior work while also showing strong generalization to unseen environments and distractors. Ge Yan tells us more about how this works and how we can make imitation learning better.

Find out more on RoboPapers #46, with Michael Cho and Chris Paxton!

Abstract:

We introduces ManiFlow, a visuomotor imitation learning policy for general robot manipulation that generates precise, high-dimensional actions conditioned on diverse visual, language and proprioceptive inputs. We leverage flow matching with consistency training to enable high-quality dexterous action generation in just 1-2 inference steps. To handle diverse input modalities efficiently, we propose DiT-X, a diffusion transformer architecture with adaptive cross-attention and AdaLN-Zero conditioning that enables fine-grained feature interactions between action tokens and multi-modal observations. ManiFlow demonstrates consistent improvements across diverse simulation benchmarks and nearly doubles success rates on real-world tasks across single-arm, bimanual, and humanoid robot setups with increasing dexterity. The extensive evaluation further demonstrates the strong robustness and generalizability of ManiFlow to novel objects and background changes, and highlights its strong scaling capability with larger-scale datasets.

Project Page: https://maniflow-policy.github.io/

ArXiV Paper: https://www.arxiv.org/pdf/2509.01819

Thread on X



This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit robopapers.substack.com
...more
View all episodesView all episodes
Download on the App Store

RoboPapersBy Chris Paxton and Michael Cho