May 30, 2025

2025.05.30 | 推理扩展提升表格推理；多模态模型视频反馈有待优化。

Listen Later

11 minutes

本期的 15 篇论文如下：

[00:22] 📊 Table-R1: Inference-Time Scaling for Table Reasoning（Table-R1：表格推理的推理时扩展）

[01:02] 🤖 VF-Eval: Evaluating Multimodal LLMs for Generating Feedback on AIGC Videos（VF-Eval：评估多模态大语言模型生成AIGC视频反馈的能力）

[01:45] 🧠 Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence（Spatial-MLLM：提升多模态大语言模型在基于视觉的空间智能方面的能力）

[02:25] 🧠 The Climb Carves Wisdom Deeper Than the Summit: On the Noisy Rewards in Learning to Reason（行胜于言：论证推理学习中的噪声奖励）

[03:11] 🤖 ZeroGUI: Automating Online GUI Learning at Zero Human Cost（ZeroGUI：零人工成本的在线GUI学习自动化）

[03:45] 🤔 VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning?（VideoReasonBench：多模态大语言模型能否执行以视觉为中心的复杂视频推理？）

[04:39] 🧬 Satori-SWE: Evolutionary Test-Time Scaling for Sample-Efficient Software Engineering（Satori-SWE: 面向高效软件工程的演化测试时扩展）

[05:15] 🤔 Are Reasoning Models More Prone to Hallucination?（推理模型更容易产生幻觉吗？）

[05:51] 🤖 cadrille: Multi-modal CAD Reconstruction with Online Reinforcement Learning（cadrille：基于在线强化学习的多模态CAD重建）

[06:29] 🎨 D-AR: Diffusion via Autoregressive Models（D-AR：基于自回归模型的扩散）

[07:16] 📸 AnySplat: Feed-forward 3D Gaussian Splatting from Unconstrained Views（AnySplat：来自非约束视角的Feed-forward 3D高斯溅射）

[07:53] 🛠 SWE-bench Goes Live!（SWE-bench-Live：一个实时更新的问题解决基准评测）

[08:36] 💡 Multi-Domain Explainability of Preferences（偏好的多领域可解释性）

[09:16] 🤖 UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning（UniRL：基于监督学习和强化学习的自提升统一多模态模型）

[10:01] 🗣 FAMA: The First Large-Scale Open-Science Speech Foundation Model for English and Italian（FAMA：首个面向英语和意大利语的大规模开放科学语音基础模型）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递

...more

View all episodes

View all episodes

Download on the App Store

Download on the App Store

Get it on Google Play

HuggingFace 每日AI论文速递

By duan

5

22 ratings

May 30, 2025

2025.05.30 | 推理扩展提升表格推理；多模态模型视频反馈有待优化。

Listen Later

11 minutes

本期的 15 篇论文如下：

[00:22] 📊 Table-R1: Inference-Time Scaling for Table Reasoning（Table-R1：表格推理的推理时扩展）

[01:02] 🤖 VF-Eval: Evaluating Multimodal LLMs for Generating Feedback on AIGC Videos（VF-Eval：评估多模态大语言模型生成AIGC视频反馈的能力）

[01:45] 🧠 Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence（Spatial-MLLM：提升多模态大语言模型在基于视觉的空间智能方面的能力）

[02:25] 🧠 The Climb Carves Wisdom Deeper Than the Summit: On the Noisy Rewards in Learning to Reason（行胜于言：论证推理学习中的噪声奖励）

[03:11] 🤖 ZeroGUI: Automating Online GUI Learning at Zero Human Cost（ZeroGUI：零人工成本的在线GUI学习自动化）

[03:45] 🤔 VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning?（VideoReasonBench：多模态大语言模型能否执行以视觉为中心的复杂视频推理？）

[04:39] 🧬 Satori-SWE: Evolutionary Test-Time Scaling for Sample-Efficient Software Engineering（Satori-SWE: 面向高效软件工程的演化测试时扩展）

[05:15] 🤔 Are Reasoning Models More Prone to Hallucination?（推理模型更容易产生幻觉吗？）

[05:51] 🤖 cadrille: Multi-modal CAD Reconstruction with Online Reinforcement Learning（cadrille：基于在线强化学习的多模态CAD重建）

[06:29] 🎨 D-AR: Diffusion via Autoregressive Models（D-AR：基于自回归模型的扩散）

[07:16] 📸 AnySplat: Feed-forward 3D Gaussian Splatting from Unconstrained Views（AnySplat：来自非约束视角的Feed-forward 3D高斯溅射）

[07:53] 🛠 SWE-bench Goes Live!（SWE-bench-Live：一个实时更新的问题解决基准评测）

[08:36] 💡 Multi-Domain Explainability of Preferences（偏好的多领域可解释性）

[09:16] 🤖 UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning（UniRL：基于监督学习和强化学习的自提升统一多模态模型）

[10:01] 🗣 FAMA: The First Large-Scale Open-Science Speech Foundation Model for English and Italian（FAMA：首个面向英语和意大利语的大规模开放科学语音基础模型）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递

...more

More shows like HuggingFace 每日AI论文速递

商业就是这样 by 商业就是这样

商业就是这样

291 Listeners

声动早咖啡 by 声动活泼

声动早咖啡

291 Listeners

42章经 by KaiQu

42章经

12 Listeners

李诞 by 李诞

李诞

253 Listeners