HuggingFace 每日AI论文速递

2025.10.02 | MCTS破局RLVR瓶颈;GEM开源智能体训练场


Listen Later

本期的 15 篇论文如下:

[00:19] 🧠 DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search(DeepSearch:以蒙特卡洛树搜索破解强化学习可验证奖励瓶颈)

[01:20] 🤖 GEM: A Gym for Agentic LLMs(GEM:面向智能体大模型的开放训练场)

[01:57] 🧠 VLA-RFT: Vision-Language-Action Reinforcement Fine-tuning with Verified Rewards in World Simulators(VLA-RFT:基于世界模拟器与验证奖励的视觉-语言-动作强化微调)

[02:36] 🎒 Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation(背包强化学习:通过优化预算分配解锁大模型探索潜能)

[03:06] 🎬 Code2Video: A Code-centric Paradigm for Educational Video Generation(Code2Video:面向教育视频生成的代码中心范式)

[03:41] ⚙ PIPer: On-Device Environment Setup via Online Reinforcement Learning(PIPer:基于在线强化学习的设备端环境自动配置)

[04:11] 🗜 ACON: Optimizing Context Compression for Long-horizon LLM Agents(ACON:面向长程LLM智能体的上下文压缩优化)

[04:52] 🔍 Why Can't Transformers Learn Multiplication? Reverse-Engineering Reveals Long-Range Dependency Pitfalls(为何Transformer学不会乘法?逆向工程揭示长程依赖陷阱)

[05:22] ⚖ BiasFreeBench: a Benchmark for Mitigating Bias in Large Language Model Responses(BiasFreeBench:面向大语言模型去偏响应评测的统一基准)

[06:01] ⚡ Flash-Searcher: Fast and Effective Web Agents via DAG-Based Parallel Execution(Flash-Searcher:基于DAG并行执行的极速高效网络智能体)

[06:42] 🚀 BroRL: Scaling Reinforcement Learning via Broadened Exploration(BroRL:通过拓宽探索规模来扩展强化学习)

[07:25] 📊 Beyond Log Likelihood: Probability-Based Objectives for Supervised Fine-Tuning across the Model Capability Continuum(超越对数似然:面向模型能力连续谱的监督微调概率目标)

[08:02] 🎯 On Predictability of Reinforcement Learning Dynamics for Large Language Models(论大型语言模型强化学习动力学的可预测性)

[08:31] 🖥 GUI-KV: Efficient GUI Agents via KV Cache with Spatio-Temporal Awareness(GUI-KV:面向具备时空感知的高效GUI智能体的KV缓存方案)

[09:17] 🧠 Training Vision-Language Process Reward Models for Test-Time Scaling in Multimodal Reasoning: Key Insights and Lessons Learned(训练视觉-语言过程奖励模型以实现多模态推理测试时扩展:关键洞见与经验总结)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递

...more
View all episodesView all episodes
Download on the App Store

HuggingFace 每日AI论文速递By duan

  • 5
  • 5
  • 5
  • 5
  • 5

5

2 ratings


More shows like HuggingFace 每日AI论文速递

View all
硅谷101|中国版 by 泓君Jane

硅谷101|中国版

56 Listeners

商业就是这样 by 商业就是这样

商业就是这样

291 Listeners

声动早咖啡 by 声动活泼

声动早咖啡

294 Listeners

思文,败类 by 思文败类

思文,败类

156 Listeners

不开玩笑 Jokes Aside by 不开玩笑JokesAside

不开玩笑 Jokes Aside

135 Listeners

人民公园说AI by JustSayAI

人民公园说AI

7 Listeners

數創實驗室 - AI時代的學習指南 by Vincent在數創

數創實驗室 - AI時代的學習指南

1 Listeners

AI可可AI生活 by fly51fly

AI可可AI生活

0 Listeners