March 04, 2026

Qwen's Best Day Was Its Last

17 minutes

Episode 0022: Qwen's Best Day Was Its Last

Why it matters. On the night Alibaba shipped Qwen3.5 — a 397-billion-parameter sparse mixture-of-experts model with 17B active parameters, a 1M-token context window, and a small-model family the open-source community had been waiting for — they fired the person who built it. This is the story of what happens when a corporate open-source champion builds something so good it undermines the product his employer is trying to sell.

Alibaba / Qwen Team. Qwen3.5 was released February 16, 2026 by the Qwen Team at Alibaba Cloud. The flagship Qwen3.5-397B-A17B is available on HuggingFace under Apache 2.0. The small-model family (4B through 35B) runs on consumer hardware — 8B on a single GPU, 14B on a gaming rig, 32B on a Mac Studio. Full model collection at Qwen on HuggingFace. The official blog post is at qwen.ai. Coverage: TechCrunch, Bloomberg, VentureBeat, OfficeChai.

The Researchers. Junyang Lin (@JustinLin610) was tech lead of the Qwen team at Alibaba Group and Peking University — the person who kept the open-weight commitment alive through years of internal corporate pressure. His departure the night of the Qwen3.5 small model launch prompted immediate public statements from colleagues; at least two other Qwen team members updated their bios to "former" within 24 hours. His replacement comes from an industrial AI background, with a mandate focused on DAU metrics and alignment with China's national AI research priorities under the 14th Five-Year Plan.

Key Technical Concepts. Qwen3.5's core architectural innovation is a hybrid Gated DeltaNet attention scheme: three out of four transformer blocks use linear attention via Gated Delta Networks — which combine Mamba2's gating mechanism with a delta rule for updating hidden states — while every fourth block uses full standard attention. This hybrid design delivers near-linear context processing cost with a 1M-token context window. The model is also natively multimodal from pretraining (early fusion across text, image, and video), not a vision adapter bolt-on, supporting 201 languages at 60% lower inference cost than its predecessor. Benchmark context: independent real-world coding evaluation on r/LocalLLaMA found that while the 397B holds up on hard tasks, it collapses on master-level multi-file agentic work — a failure mode not captured by Alibaba's self-reported SWE-bench numbers. The Qwen3 technical report (precursor architecture) is on arXiv:2505.09388. The broader architectural lineage: DeltaNet parallel implementation and the flash-linear-attention library.

Notes: No dedicated arXiv for Qwen3.5 exists yet — the primary citation is the blog post. The Qwen3 technical report (2505.09388) is the closest precursor. ~20 verified links total. Adapted the format for a news/analysis episode — no single paper anchor, so the model HuggingFace page stands in that role.

...more

View all episodes

By Daily Tech Feed

March 04, 2026

Qwen's Best Day Was Its Last

17 minutes

Episode 0022: Qwen's Best Day Was Its Last

...more

Share Qwen's Best Day Was Its Last

Sign up to save your podcasts

Qwen's Best Day Was Its Last

Qwen's Best Day Was Its Last