These May and September 2025 technical reports introduce and evaluate two distinct but related large language models: the Qwen3 family and the Qwen3-Omni multimodal system. The first source focuses on the text-based Qwen3 models, highlighting their development process, which includes a sophisticated multilingual data annotation system and a multi-stage training pipeline incorporating Strong-to-Weak Distillation and "Thinking Mode" for complex tasks like coding and mathematics. The second, more comprehensive source describes Qwen3-Omni, a single model designed to excel across text, image, audio, and video modalities without performance degradation, utilizing a novel Thinker–Talker Mixture-of-Experts (MoE) architecture for real-time speech and reasoning. Crucially, Qwen3-Omni achieves state-of-the-art performance in audio tasks and boasts extremely low first-packet latency for interactive applications, supporting a wide range of multilingual capabilities in both speech and text.Sources:https://arxiv.org/pdf/2505.09388https://arxiv.org/pdf/2509.17765