HuggingFace 每日AI论文速递

2025.06.03 | 高熵Token提升LLM推理;推理健身房优化强化学习环境。


Listen Later

本期的 15 篇论文如下:

[00:22] 🧠 Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning(超越80/20法则:高熵少数Token驱动LLM推理的有效强化学习)

[01:05] 🧠 REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards(推理健身房:基于可验证奖励的强化学习推理环境)

[01:46] 🤖 SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics(SmolVLA:一种用于经济高效型机器人的视觉-语言-动作模型)

[02:31] 🚀 Taming LLMs by Scaling Learning Rates with Gradient Grouping(通过梯度分组调整学习率以驯服大型语言模型)

[03:19] 🧩 Jigsaw-R1: A Study of Rule-based Visual Reinforcement Learning with Jigsaw Puzzles(拼图-R1:基于规则的视觉强化学习与拼图游戏研究)

[04:06] 🎬 Temporal In-Context Fine-Tuning for Versatile Control of Video Diffusion Models(用于视频扩散模型多功能控制的时序上下文微调)

[04:43] 🤖 ARIA: Training Language Agents with Intention-Driven Reward Aggregation(ARIA:基于意图驱动的奖励聚合训练语言智能体)

[05:27] 🤖 LoHoVLA: A Unified Vision-Language-Action Model for Long-Horizon Embodied Tasks(LoHoVLA:用于长时程具身任务的统一视觉-语言-动作模型)

[06:02] 🤖 ShapeLLM-Omni: A Native Multimodal LLM for 3D Generation and Understanding(ShapeLLM-Omni:用于3D生成与理解的原生多模态LLM)

[06:41] 🤖 Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control(基于协作轨迹控制的机器人操作视频生成学习)

[07:15] 🚀 AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning(AReaL:用于语言推理的大规模异步强化学习系统)

[07:56] 🌍 EarthMind: Towards Multi-Granular and Multi-Sensor Earth Observation with Large Multimodal Models(地球之 Mind:面向多粒度和多传感器地球观测的大型多模态模型)

[08:35] 🤔 SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning(SRPO:通过反思感知强化学习增强多模态LLM的推理能力)

[09:14] 🤖 MiCRo: Mixture Modeling and Context-aware Routing for Personalized Preference Learning(MiCRo:用于个性化偏好学习的混合建模和上下文感知路由)

[09:48] 🤖 Incentivizing Reasoning for Advanced Instruction-Following of Large Language Models(激励推理以提升大型语言模型的高级指令跟随能力)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递

...more
View all episodesView all episodes
Download on the App Store

HuggingFace 每日AI论文速递By duan