Awesome Agents Podcast

vLLM 0.17 Ships FlashAttention 4 and Live MoE Scaling


Listen Later

vLLM v0.17.0 adds FlashAttention 4, elastic expert parallelism for live MoE rescaling, full Qwen3.5 support, and a performance-mode flag, all in 699 commits from 272 contributors.
...more
View all episodesView all episodes
Download on the App Store

Awesome Agents PodcastBy Awesome Agents