June 03, 2025

2025.06.03 | 高熵Token提升LLM推理；推理健身房优化强化学习环境。

Listen Later

10 minutes

本期的 15 篇论文如下：

[00:22] 🧠 Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning（超越80/20法则：高熵少数Token驱动LLM推理的有效强化学习）

[01:05] 🧠 REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards（推理健身房：基于可验证奖励的强化学习推理环境）

[01:46] 🤖 SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics（SmolVLA：一种用于经济高效型机器人的视觉-语言-动作模型）

[02:31] 🚀 Taming LLMs by Scaling Learning Rates with Gradient Grouping（通过梯度分组调整学习率以驯服大型语言模型）

[03:19] 🧩 Jigsaw-R1: A Study of Rule-based Visual Reinforcement Learning with Jigsaw Puzzles（拼图-R1：基于规则的视觉强化学习与拼图游戏研究）

[04:06] 🎬 Temporal In-Context Fine-Tuning for Versatile Control of Video Diffusion Models（用于视频扩散模型多功能控制的时序上下文微调）

[04:43] 🤖 ARIA: Training Language Agents with Intention-Driven Reward Aggregation（ARIA：基于意图驱动的奖励聚合训练语言智能体）

[05:27] 🤖 LoHoVLA: A Unified Vision-Language-Action Model for Long-Horizon Embodied Tasks（LoHoVLA：用于长时程具身任务的统一视觉-语言-动作模型）

[06:02] 🤖 ShapeLLM-Omni: A Native Multimodal LLM for 3D Generation and Understanding（ShapeLLM-Omni：用于3D生成与理解的原生多模态LLM）

[06:41] 🤖 Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control（基于协作轨迹控制的机器人操作视频生成学习）

[07:15] 🚀 AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning（AReaL：用于语言推理的大规模异步强化学习系统）

[07:56] 🌍 EarthMind: Towards Multi-Granular and Multi-Sensor Earth Observation with Large Multimodal Models（地球之 Mind：面向多粒度和多传感器地球观测的大型多模态模型）

[08:35] 🤔 SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning（SRPO：通过反思感知强化学习增强多模态LLM的推理能力）

[09:14] 🤖 MiCRo: Mixture Modeling and Context-aware Routing for Personalized Preference Learning（MiCRo：用于个性化偏好学习的混合建模和上下文感知路由）

[09:48] 🤖 Incentivizing Reasoning for Advanced Instruction-Following of Large Language Models（激励推理以提升大型语言模型的高级指令跟随能力）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递

...more

View all episodes

View all episodes

Download on the App Store

Download on the App Store

Get it on Google Play

HuggingFace 每日AI论文速递

By duan

5

22 ratings

June 03, 2025

2025.06.03 | 高熵Token提升LLM推理；推理健身房优化强化学习环境。

Listen Later

10 minutes

本期的 15 篇论文如下：

[00:22] 🧠 Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning（超越80/20法则：高熵少数Token驱动LLM推理的有效强化学习）

[01:05] 🧠 REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards（推理健身房：基于可验证奖励的强化学习推理环境）

[01:46] 🤖 SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics（SmolVLA：一种用于经济高效型机器人的视觉-语言-动作模型）

[02:31] 🚀 Taming LLMs by Scaling Learning Rates with Gradient Grouping（通过梯度分组调整学习率以驯服大型语言模型）

[03:19] 🧩 Jigsaw-R1: A Study of Rule-based Visual Reinforcement Learning with Jigsaw Puzzles（拼图-R1：基于规则的视觉强化学习与拼图游戏研究）

[04:06] 🎬 Temporal In-Context Fine-Tuning for Versatile Control of Video Diffusion Models（用于视频扩散模型多功能控制的时序上下文微调）

[04:43] 🤖 ARIA: Training Language Agents with Intention-Driven Reward Aggregation（ARIA：基于意图驱动的奖励聚合训练语言智能体）

[05:27] 🤖 LoHoVLA: A Unified Vision-Language-Action Model for Long-Horizon Embodied Tasks（LoHoVLA：用于长时程具身任务的统一视觉-语言-动作模型）

[06:02] 🤖 ShapeLLM-Omni: A Native Multimodal LLM for 3D Generation and Understanding（ShapeLLM-Omni：用于3D生成与理解的原生多模态LLM）

[06:41] 🤖 Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control（基于协作轨迹控制的机器人操作视频生成学习）

[07:15] 🚀 AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning（AReaL：用于语言推理的大规模异步强化学习系统）

[07:56] 🌍 EarthMind: Towards Multi-Granular and Multi-Sensor Earth Observation with Large Multimodal Models（地球之 Mind：面向多粒度和多传感器地球观测的大型多模态模型）

[08:35] 🤔 SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning（SRPO：通过反思感知强化学习增强多模态LLM的推理能力）

[09:14] 🤖 MiCRo: Mixture Modeling and Context-aware Routing for Personalized Preference Learning（MiCRo：用于个性化偏好学习的混合建模和上下文感知路由）

[09:48] 🤖 Incentivizing Reasoning for Advanced Instruction-Following of Large Language Models（激励推理以提升大型语言模型的高级指令跟随能力）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递

...more

More shows like HuggingFace 每日AI论文速递

硅谷101|中国版 by 泓君Jane

硅谷101|中国版

56 Listeners

商业就是这样 by 商业就是这样

商业就是这样

292 Listeners

声动早咖啡 by 声动活泼

声动早咖啡

293 Listeners

思文，败类 by 思文败类

思文，败类

156 Listeners

不开玩笑 Jokes Aside by 不开玩笑JokesAside

不开玩笑 Jokes Aside

136 Listeners

人民公园说AI by JustSayAI

人民公园说AI

7 Listeners

數創實驗室 - AI時代的學習指南 by Vincent在數創

數創實驗室 - AI時代的學習指南

1 Listeners

AI可可AI生活 by fly51fly

AI可可AI生活

0 Listeners