HuggingFace 每日AI论文速递

2025.09.03 | 智能体RL提升大模型自主性;SimpleTIR解多轮工具推理


Listen Later

本期的 15 篇论文如下:

[00:19] 🤖 The Landscape of Agentic Reinforcement Learning for LLMs: A Survey(面向大语言模型的智能体强化学习全景:一项综述)

[00:40] 🚀 SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning(SimpleTIR:面向多轮工具集成推理的端到端强化学习)

[01:12] 🤖 UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning(UI-TARS-2技术报告:通过多轮强化学习推进GUI代理)

[01:41] 🎥 ELV-Halluc: Benchmarking Semantic Aggregation Hallucinations in Long Video Understanding(ELV-Halluc:长视频理解中的语义聚合幻觉基准测试)

[02:12] 🔄 LLaVA-Critic-R1: Your Critic Model is Secretly a Strong Policy Model(LLaVA-Critic-R1:你的评论模型其实是一个强大的策略模型)

[02:43] 🔧 VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use(VerlTool:迈向整体性代理强化学习与工具使用)

[03:11] 📄 POINTS-Reader: Distillation-Free Adaptation of Vision-Language Models for Document Conversion(POINTS-Reader:无蒸馏适配的视觉-语言模型用于文档转换)

[03:33] 🩺 Baichuan-M2: Scaling Medical Capability with Large Verifier System(百川-M2:通过大规模验证系统扩展医疗能力)

[03:57] 🎥 Kwai Keye-VL 1.5 Technical Report(快手 Keye-VL 1.5 技术报告)

[04:20] 🤖 Implicit Actor Critic Coupling via a Supervised Learning Framework for RLVR(通过监督学习框架实现隐式Actor-Critic耦合用于RLVR)

[04:45] 🧠 Reasoning Vectors: Transferring Chain-of-Thought Capabilities via Task Arithmetic(推理向量:通过任务算术传递思维链能力)

[05:11] 🔄 Jointly Reinforcing Diversity and Quality in Language Model Generations(在语言模型生成中联合强化多样性与质量)

[05:42] 🚀 DCPO: Dynamic Clipping Policy Optimization(DCPO: 动态裁剪策略优化)

[06:04] 🚀 OpenVision 2: A Family of Generative Pretrained Visual Encoders for Multimodal Learning(OpenVision 2:用于多模态学习的生成式预训练视觉编码器系列)

[06:27] 🎬 GenCompositor: Generative Video Compositing with Diffusion Transformer(GenCompositor:基于扩散变换器的生成式视频合成)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递

...more
View all episodesView all episodes
Download on the App Store

HuggingFace 每日AI论文速递By duan

  • 5
  • 5
  • 5
  • 5
  • 5

5

2 ratings


More shows like HuggingFace 每日AI论文速递

View all
硅谷101|中国版 by 泓君Jane

硅谷101|中国版

56 Listeners

商业就是这样 by 商业就是这样

商业就是这样

292 Listeners

声动早咖啡 by 声动活泼

声动早咖啡

293 Listeners

思文,败类 by 思文败类

思文,败类

156 Listeners

不开玩笑 Jokes Aside by 不开玩笑JokesAside

不开玩笑 Jokes Aside

136 Listeners

人民公园说AI by JustSayAI

人民公园说AI

7 Listeners

數創實驗室 - AI時代的學習指南 by Vincent在數創

數創實驗室 - AI時代的學習指南

1 Listeners

AI可可AI生活 by fly51fly

AI可可AI生活

0 Listeners