January 05, 2025

【月末特辑】12月最火AI论文 | Qwen2.5提升大语言模型性能，阿波罗优化视频理解效率。

23 minutes

本期的 10 篇论文如下：

[00:31] TOP1(🔥335) | 🤖 Qwen2.5 Technical Report（Qwen2.5技术报告）

[02:44] TOP2(🔥136) | 🎥 Apollo: An Exploration of Video Understanding in Large Multimodal Models（阿波罗：大型多模态模型中的视频理解探索）

[05:01] TOP3(🔥123) | 🚀 Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling（通过模型、数据和测试时扩展提升开源多模态模型的性能边界）

[07:18] TOP4(🔥121) | 🔄 PaliGemma 2: A Family of Versatile VLMs for Transfer（PaliGemma 2：多功能视觉语言模型的迁移研究）

[09:38] TOP5(🔥116) | 🚀 Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference（更智能、更优、更快、更长：一种现代双向编码器，用于快速、内存高效的长上下文微调和推理）

[12:21] TOP6(🔥108) | 🚀 SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance（SNOOPI：超强一步扩散蒸馏与适当引导）

[14:42] TOP7(🔥105) | 🔍 VisionZip: Longer is Better but Not Necessary in Vision Language Models（视觉压缩：视觉语言模型中长度并非必要优势）

[16:51] TOP8(🔥96) | 🧠 Phi-4 Technical Report（Phi-4 技术报告）

[18:55] TOP9(🔥92) | 🎥 InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions（InternLM-XComposer2.5-OmniLive：一个用于长期流式视频和音频交互的综合多模态系统）

[21:02] TOP10(🔥91) | 🧠 Are Your LLMs Capable of Stable Reasoning?（你的大语言模型能够稳定推理吗？）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递

...more