
Sign up to save your podcasts
Or


The paper presents the Qwen2-VL Series, an advanced family of Large Vision-Language Models (LVLMs) developed by the Qwen Team at Alibaba Group. Available in three parameter sizes (2B, 7B/8B, and 72B), the Qwen2-VL models achieve state-of-the-art performance that rivals leading proprietary models like GPT-4o and Claude 3.5-Sonnet across a variety of multimodal benchmarks.
The models achieve this performance through two primary architectural innovations:
These innovations unlock several powerful capabilities for Qwen2-VL:
By Yun WuThe paper presents the Qwen2-VL Series, an advanced family of Large Vision-Language Models (LVLMs) developed by the Qwen Team at Alibaba Group. Available in three parameter sizes (2B, 7B/8B, and 72B), the Qwen2-VL models achieve state-of-the-art performance that rivals leading proprietary models like GPT-4o and Claude 3.5-Sonnet across a variety of multimodal benchmarks.
The models achieve this performance through two primary architectural innovations:
These innovations unlock several powerful capabilities for Qwen2-VL: