February 25, 2025

2025.02.25 | 长上下文优化创新，视觉扩散高效通用。

14 minutes

本期的 20 篇论文如下：

[00:27] 📖 Thus Spake Long-Context Large Language Model（长上下文大语言模型如是说）

[01:09] 🌈 DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks（用于视觉感知任务的通用扩散模型）

[01:48] 🚀 Slamming: Training a Speech Language Model on One GPU in a Day（撞击：在一天内使用单个GPU训练语音语言模型）

[02:32] 🎥 VideoGrain: Modulating Space-Time Attention for Multi-grained Video Editing（视频粒度：调节时空注意力实现多粒度视频编辑）

[03:11] 🎧 Audio-FLAN: A Preliminary Release（音频FLAN：初步发布）

[03:43] 🧠 CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models（CodeCriticBench：面向大型语言模型的全面代码 critique 基准测试）

[04:28] 🎨 GCC: Generative Color Constancy via Diffusing a Color Checker（GCC：通过扩散色卡生成颜色恒常性）

[05:11] 📊 Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning（数学推理中测试时间扩展的语言通用性）

[05:57] 🚀 Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment（让LoRA再次伟大：通过自适应奇异值和混合专家优化对齐提升LoRA性能）

[06:38] 🧠 Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models（多模态不一致性推理（MMIR）：多模态推理模型的新基准）

[07:23] 🎥 RIFLEx: A Free Lunch for Length Extrapolation in Video Diffusion Transformers（RIFLEx：视频扩散Transformer中长度外推的免费午餐）

[08:01] 📱 Mobile-Agent-V: Learning Mobile Device Operation Through Video-Guided Multi-Agent Collaboration（移动代理V：通过视频引导的多代理协作学习移动设备操作）

[08:45] ⏳ Benchmarking Temporal Reasoning and Alignment Across Chinese Dynasties（中国朝代间的时间推理与对齐基准测试）

[09:31] 🤖 Reflective Planning: Vision-Language Models for Multi-Stage Long-Horizon Robotic Manipulation（反射性规划：视觉语言模型在多阶段长时程机器人操作中的应用）

[10:02] 🔄 Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam（稳定-SPAM：如何在4位精度下比16位Adam更稳定地训练）

[10:43] 📝 Can Community Notes Replace Professional Fact-Checkers?（社区笔记能替代专业事实核查员吗？）

[11:24] 📈 Forecasting Open-Weight AI Model Growth on Hugging Face（预测Hugging Face上开放权重AI模型的增长）

[12:08] 🔑 Beyond Release: Access Considerations for Generative AI Systems（超越发布：生成式人工智能系统的访问考量）

[12:49] 🌐 TAG: A Decentralized Framework for Multi-Agent Hierarchical Reinforcement Learning（TAG：一种用于多智能体分层强化学习的去中心化框架）

[13:30] 💃 X-Dancer: Expressive Music to Human Dance Video Generation（X-Dancer：从音乐生成生动舞蹈视频）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递

...more

View all episodes

By duan

22 ratings