October 10, 2025

2025.10.10 | 早期经验的Agent Learning；图文交错反思链跃升至24.9%

10 minutes

本期的 14 篇论文如下：

[00:16] 🌱 Agent Learning via Early Experience（基于早期经验的主体学习）

[00:50] 🧠 MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization（MM-HELIX：以整体平台与自适应混合策略优化激发多模态长链反思推理）

[01:42] 🧪 From What to Why: A Multi-Agent System for Evidence-based Chemical Reaction Condition Reasoning（从“是什么”到“为什么”：面向循证化学反应条件推理的多智能体系统）

[02:19] 🎬 UniVideo: Unified Understanding, Generation, and Editing for Videos（UniVideo：统一理解、生成与编辑视频的多模态框架）

[03:01] 🧠 When Thoughts Meet Facts: Reusable Reasoning for Long-Context LMs（当思想邂逅事实：面向长上下文语言模型的可复用推理）

[03:43] 🧠 Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning（元认知增强推理模型：自对齐强化学习）

[04:25] 🧠 MemMamba: Rethinking Memory Patterns in State Space Model（MemMamba：重新思考状态空间模型中的记忆模式）

[05:17] 🛡 The Alignment Waltz: Jointly Training Agents to Collaborate for Safety（对齐圆舞曲：联合训练智能体协同守护安全）

[05:53] 🎯 Hybrid Reinforcement: When Reward Is Sparse, It's Better to Be Dense（混合强化：奖励稀疏时，密集信号更胜一筹）

[06:40] 🧪 NewtonBench: Benchmarking Generalizable Scientific Law Discovery in LLM Agents（NewtonBench：评测大模型智能体在通用科学定律发现中的基准）

[07:17] 🪚 DeepPrune: Parallel Scaling without Inter-trace Redundancy（DeepPrune：并行扩展中消除跨路径冗余的高效推理框架）

[07:54] 🚀 Training-Free Group Relative Policy Optimization（免训练群组相对策略优化）

[08:24] 🪄 ARTDECO: Towards Efficient and High-Fidelity On-the-Fly 3D Reconstruction with Structured Scene Representation（ARTDECO：面向高效高保真即时三维重建的结构化场景表征）

[08:55] 🤥 LLMs Learn to Deceive Unintentionally: Emergent Misalignment in Dishonesty from Misaligned Samples to Biased Human-AI Interactions（大模型在欺骗性样本与偏见人机交互中意外学会欺骗：不诚实行为的新兴错位）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递

...more