
Sign up to save your podcasts
Or
本期的 15 篇论文如下:
[00:22] 🤖 The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models(用于推理语言模型的强化学习的熵机制)
[00:56] 🛣 R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing(R2R:通过大小模型令牌路由高效导航不同的推理路径)
[01:40] 🧠 Skywork Open Reasoner 1 Technical Report(Skywork开放推理器1技术报告)
[02:20] 🔍 Sherlock: Self-Correcting Reasoning in Vision-Language Models(夏洛克:视觉-语言模型中的自我纠正推理)
[02:55] 🤖 Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO(基于GRPO的无监督后训练提升多模态LLM推理能力)
[03:35] 🤖 SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents(SWE-rebench:一个用于软件工程代理任务收集和去污染评估的自动化流程)
[04:25] 🚀 SageAttention2++: A More Efficient Implementation of SageAttention2(SageAttention2++:一种更高效的SageAttention2实现)
[05:12] 🧠 Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start(通过强化学习与冷启动推进多模态推理)
[05:59] 🎬 Fostering Video Reasoning via Next-Event Prediction(通过预测下一事件促进视频推理)
[06:42] 💡 RenderFormer: Transformer-based Neural Rendering of Triangle Meshes with Global Illumination(RenderFormer:基于Transformer的三角形网格全局光照神经渲染)
[07:25] 🔬 DeepResearchGym: A Free, Transparent, and Reproducible Evaluation Sandbox for Deep Research(DeepResearchGym:一个免费、透明且可复现的深度研究评估沙盒)
[08:16] 🖼 Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment(链式缩放:通过尺度自回归和偏好对齐实现极限超分辨率)
[08:58] 🧩 Universal Reasoner: A Single, Composable Plug-and-Play Reasoner for Frozen LLMs(通用推理器:一个用于冻结LLM的单一、可组合的即插即用推理器)
[09:38] 🚚 SVRPBench: A Realistic Benchmark for Stochastic Vehicle Routing Problem(SVRPBench:一个面向随机车辆路径问题的真实基准)
[10:26] 🌐 Judging Quality Across Languages: A Multilingual Approach to Pretraining Data Filtering with Language Models(跨语言质量评估:一种基于语言模型的多语种预训练数据过滤方法)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
本期的 15 篇论文如下:
[00:22] 🤖 The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models(用于推理语言模型的强化学习的熵机制)
[00:56] 🛣 R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing(R2R:通过大小模型令牌路由高效导航不同的推理路径)
[01:40] 🧠 Skywork Open Reasoner 1 Technical Report(Skywork开放推理器1技术报告)
[02:20] 🔍 Sherlock: Self-Correcting Reasoning in Vision-Language Models(夏洛克:视觉-语言模型中的自我纠正推理)
[02:55] 🤖 Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO(基于GRPO的无监督后训练提升多模态LLM推理能力)
[03:35] 🤖 SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents(SWE-rebench:一个用于软件工程代理任务收集和去污染评估的自动化流程)
[04:25] 🚀 SageAttention2++: A More Efficient Implementation of SageAttention2(SageAttention2++:一种更高效的SageAttention2实现)
[05:12] 🧠 Advancing Multimodal Reasoning via Reinforcement Learning with Cold Start(通过强化学习与冷启动推进多模态推理)
[05:59] 🎬 Fostering Video Reasoning via Next-Event Prediction(通过预测下一事件促进视频推理)
[06:42] 💡 RenderFormer: Transformer-based Neural Rendering of Triangle Meshes with Global Illumination(RenderFormer:基于Transformer的三角形网格全局光照神经渲染)
[07:25] 🔬 DeepResearchGym: A Free, Transparent, and Reproducible Evaluation Sandbox for Deep Research(DeepResearchGym:一个免费、透明且可复现的深度研究评估沙盒)
[08:16] 🖼 Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment(链式缩放:通过尺度自回归和偏好对齐实现极限超分辨率)
[08:58] 🧩 Universal Reasoner: A Single, Composable Plug-and-Play Reasoner for Frozen LLMs(通用推理器:一个用于冻结LLM的单一、可组合的即插即用推理器)
[09:38] 🚚 SVRPBench: A Realistic Benchmark for Stochastic Vehicle Routing Problem(SVRPBench:一个面向随机车辆路径问题的真实基准)
[10:26] 🌐 Judging Quality Across Languages: A Multilingual Approach to Pretraining Data Filtering with Language Models(跨语言质量评估:一种基于语言模型的多语种预训练数据过滤方法)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递