May 23, 2025

2025.05.23 | 智能体加速科研；推理模型指令遵循不佳。

11 minutes

本期的 15 篇论文如下：

[00:22] 🧪 NovelSeek: When Agent Becomes the Scientist -- Building Closed-Loop System from Hypothesis to Verification（NovelSeek：当智能体成为科学家——构建从假设到验证的闭环系统）

[01:05] 🤔 Scaling Reasoning, Losing Control: Evaluating Instruction Following in Large Reasoning Models（规模化推理，失控的指令：评估大型推理模型中的指令遵循）

[01:50] 🤖 Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning（Tool-Star：通过强化学习赋能基于LLM的多工具推理器）

[02:30] 🖼 KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models（KRIS-Bench：下一代智能图像编辑模型评测基准）

[03:16] 🖼 Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning（像素推理器：通过好奇心驱动的强化学习激励像素空间推理）

[04:03] ⏱ QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design（QuickVideo：基于系统算法协同设计的实时长视频理解）

[04:55] 🖼 GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning（GoT-R1：利用强化学习释放多模态大语言模型在视觉生成中的推理能力）

[05:39] 🖼 LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning（LLaDA-V：基于视觉指令调整的大型语言扩散模型）

[06:15] 📉 Risk-Averse Reinforcement Learning with Itakura-Saito Loss（基于Itakura-Saito损失的风险规避强化学习）

[06:54] 🚀 Scaling Diffusion Transformers Efficiently via $μ$P（通过 μP 高效扩展扩散Transformer）

[07:33] 🖼 Understanding Generative AI Capabilities in Everyday Image Editing Tasks（理解生成式人工智能在日常图像编辑任务中的能力）

[08:19] 🧠 Let LLMs Break Free from Overthinking via Self-Braking Tuning（让大型语言模型通过自刹车调整摆脱过度思考）

[08:56] 🧠 Mind the Gap: Bridging Thought Leap for Improved Chain-of-Thought Tuning（弥合差距：桥接思维跳跃以改进思维链微调）

[09:37] 🎮 VideoGameQA-Bench: Evaluating Vision-Language Models for Video Game Quality Assurance（VideoGameQA-Bench：评估视觉-语言模型在视频游戏质量保证中的应用）

[10:23] 💡 Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding（Dimple：具有并行解码的离散扩散多模态大型语言模型）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递

...more

View all episodes

By duan