May 29, 2026

Qwen-VLA: A Generalist Vision–Language–Action Robot Model

35 minutes

A single generalist VLA built on Qwen3.5-4B + 1.15B DiT flow-matching action decoder that unifies manipulation, navigation, and trajectory prediction across 11 embodiments via text-described embodiment prompts. Trained in four stages and outperforms task-specific specialists on real ALOHA and sim benchmarks without per-task fine-tuning.

...more

View all episodes

By Shaoqing Tan

May 29, 2026

Qwen-VLA: A Generalist Vision–Language–Action Robot Model

35 minutes

...more

Share Qwen-VLA: A Generalist Vision–Language–Action Robot Model

Sign up to save your podcasts

Qwen-VLA: A Generalist Vision–Language–Action Robot Model

Qwen-VLA: A Generalist Vision–Language–Action Robot Model