
Sign up to save your podcasts
Or


Why it matters. On the night Alibaba shipped Qwen3.5 — a 397-billion-parameter sparse mixture-of-experts model with 17B active parameters, a 1M-token context window, and a small-model family the open-source community had been waiting for — they fired the person who built it. This is the story of what happens when a corporate open-source champion builds something so good it undermines the product his employer is trying to sell.
Alibaba / Qwen Team. Qwen3.5 was released February 16, 2026 by the Qwen Team at Alibaba Cloud. The flagship Qwen3.5-397B-A17B is available on HuggingFace under Apache 2.0. The small-model family (4B through 35B) runs on consumer hardware — 8B on a single GPU, 14B on a gaming rig, 32B on a Mac Studio. Full model collection at Qwen on HuggingFace. The official blog post is at qwen.ai. Coverage: TechCrunch, Bloomberg, VentureBeat, OfficeChai.
The Researchers. Junyang Lin (@JustinLin610) was tech lead of the Qwen team at Alibaba Group and Peking University — the person who kept the open-weight commitment alive through years of internal corporate pressure. His departure the night of the Qwen3.5 small model launch prompted immediate public statements from colleagues; at least two other Qwen team members updated their bios to "former" within 24 hours. His replacement comes from an industrial AI background, with a mandate focused on DAU metrics and alignment with China's national AI research priorities under the 14th Five-Year Plan.
Key Technical Concepts. Qwen3.5's core architectural innovation is a hybrid Gated DeltaNet attention scheme: three out of four transformer blocks use linear attention via Gated Delta Networks — which combine Mamba2's gating mechanism with a delta rule for updating hidden states — while every fourth block uses full standard attention. This hybrid design delivers near-linear context processing cost with a 1M-token context window. The model is also natively multimodal from pretraining (early fusion across text, image, and video), not a vision adapter bolt-on, supporting 201 languages at 60% lower inference cost than its predecessor. Benchmark context: independent real-world coding evaluation on r/LocalLLaMA found that while the 397B holds up on hard tasks, it collapses on master-level multi-file agentic work — a failure mode not captured by Alibaba's self-reported SWE-bench numbers. The Qwen3 technical report (precursor architecture) is on arXiv:2505.09388. The broader architectural lineage: DeltaNet parallel implementation and the flash-linear-attention library.
Notes: No dedicated arXiv for Qwen3.5 exists yet — the primary citation is the blog post. The Qwen3 technical report (2505.09388) is the closest precursor. ~20 verified links total. Adapted the format for a news/analysis episode — no single paper anchor, so the model HuggingFace page stands in that role.
By Daily Tech FeedWhy it matters. On the night Alibaba shipped Qwen3.5 — a 397-billion-parameter sparse mixture-of-experts model with 17B active parameters, a 1M-token context window, and a small-model family the open-source community had been waiting for — they fired the person who built it. This is the story of what happens when a corporate open-source champion builds something so good it undermines the product his employer is trying to sell.
Alibaba / Qwen Team. Qwen3.5 was released February 16, 2026 by the Qwen Team at Alibaba Cloud. The flagship Qwen3.5-397B-A17B is available on HuggingFace under Apache 2.0. The small-model family (4B through 35B) runs on consumer hardware — 8B on a single GPU, 14B on a gaming rig, 32B on a Mac Studio. Full model collection at Qwen on HuggingFace. The official blog post is at qwen.ai. Coverage: TechCrunch, Bloomberg, VentureBeat, OfficeChai.
The Researchers. Junyang Lin (@JustinLin610) was tech lead of the Qwen team at Alibaba Group and Peking University — the person who kept the open-weight commitment alive through years of internal corporate pressure. His departure the night of the Qwen3.5 small model launch prompted immediate public statements from colleagues; at least two other Qwen team members updated their bios to "former" within 24 hours. His replacement comes from an industrial AI background, with a mandate focused on DAU metrics and alignment with China's national AI research priorities under the 14th Five-Year Plan.
Key Technical Concepts. Qwen3.5's core architectural innovation is a hybrid Gated DeltaNet attention scheme: three out of four transformer blocks use linear attention via Gated Delta Networks — which combine Mamba2's gating mechanism with a delta rule for updating hidden states — while every fourth block uses full standard attention. This hybrid design delivers near-linear context processing cost with a 1M-token context window. The model is also natively multimodal from pretraining (early fusion across text, image, and video), not a vision adapter bolt-on, supporting 201 languages at 60% lower inference cost than its predecessor. Benchmark context: independent real-world coding evaluation on r/LocalLLaMA found that while the 397B holds up on hard tasks, it collapses on master-level multi-file agentic work — a failure mode not captured by Alibaba's self-reported SWE-bench numbers. The Qwen3 technical report (precursor architecture) is on arXiv:2505.09388. The broader architectural lineage: DeltaNet parallel implementation and the flash-linear-attention library.
Notes: No dedicated arXiv for Qwen3.5 exists yet — the primary citation is the blog post. The Qwen3 technical report (2505.09388) is the closest precursor. ~20 verified links total. Adapted the format for a news/analysis episode — no single paper anchor, so the model HuggingFace page stands in that role.