
Sign up to save your podcasts
Or
本期的 15 篇论文如下:
[00:22] 📊 Table-R1: Inference-Time Scaling for Table Reasoning(Table-R1:表格推理的推理时扩展)
[01:02] 🤖 VF-Eval: Evaluating Multimodal LLMs for Generating Feedback on AIGC Videos(VF-Eval:评估多模态大语言模型生成AIGC视频反馈的能力)
[01:45] 🧠 Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence(Spatial-MLLM:提升多模态大语言模型在基于视觉的空间智能方面的能力)
[02:25] 🧠 The Climb Carves Wisdom Deeper Than the Summit: On the Noisy Rewards in Learning to Reason(行胜于言:论证推理学习中的噪声奖励)
[03:11] 🤖 ZeroGUI: Automating Online GUI Learning at Zero Human Cost(ZeroGUI:零人工成本的在线GUI学习自动化)
[03:45] 🤔 VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning?(VideoReasonBench:多模态大语言模型能否执行以视觉为中心的复杂视频推理?)
[04:39] 🧬 Satori-SWE: Evolutionary Test-Time Scaling for Sample-Efficient Software Engineering(Satori-SWE: 面向高效软件工程的演化测试时扩展)
[05:15] 🤔 Are Reasoning Models More Prone to Hallucination?(推理模型更容易产生幻觉吗?)
[05:51] 🤖 cadrille: Multi-modal CAD Reconstruction with Online Reinforcement Learning(cadrille:基于在线强化学习的多模态CAD重建)
[06:29] 🎨 D-AR: Diffusion via Autoregressive Models(D-AR:基于自回归模型的扩散)
[07:16] 📸 AnySplat: Feed-forward 3D Gaussian Splatting from Unconstrained Views(AnySplat:来自非约束视角的Feed-forward 3D高斯溅射)
[07:53] 🛠 SWE-bench Goes Live!(SWE-bench-Live:一个实时更新的问题解决基准评测)
[08:36] 💡 Multi-Domain Explainability of Preferences(偏好的多领域可解释性)
[09:16] 🤖 UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning(UniRL:基于监督学习和强化学习的自提升统一多模态模型)
[10:01] 🗣 FAMA: The First Large-Scale Open-Science Speech Foundation Model for English and Italian(FAMA:首个面向英语和意大利语的大规模开放科学语音基础模型)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
本期的 15 篇论文如下:
[00:22] 📊 Table-R1: Inference-Time Scaling for Table Reasoning(Table-R1:表格推理的推理时扩展)
[01:02] 🤖 VF-Eval: Evaluating Multimodal LLMs for Generating Feedback on AIGC Videos(VF-Eval:评估多模态大语言模型生成AIGC视频反馈的能力)
[01:45] 🧠 Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence(Spatial-MLLM:提升多模态大语言模型在基于视觉的空间智能方面的能力)
[02:25] 🧠 The Climb Carves Wisdom Deeper Than the Summit: On the Noisy Rewards in Learning to Reason(行胜于言:论证推理学习中的噪声奖励)
[03:11] 🤖 ZeroGUI: Automating Online GUI Learning at Zero Human Cost(ZeroGUI:零人工成本的在线GUI学习自动化)
[03:45] 🤔 VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning?(VideoReasonBench:多模态大语言模型能否执行以视觉为中心的复杂视频推理?)
[04:39] 🧬 Satori-SWE: Evolutionary Test-Time Scaling for Sample-Efficient Software Engineering(Satori-SWE: 面向高效软件工程的演化测试时扩展)
[05:15] 🤔 Are Reasoning Models More Prone to Hallucination?(推理模型更容易产生幻觉吗?)
[05:51] 🤖 cadrille: Multi-modal CAD Reconstruction with Online Reinforcement Learning(cadrille:基于在线强化学习的多模态CAD重建)
[06:29] 🎨 D-AR: Diffusion via Autoregressive Models(D-AR:基于自回归模型的扩散)
[07:16] 📸 AnySplat: Feed-forward 3D Gaussian Splatting from Unconstrained Views(AnySplat:来自非约束视角的Feed-forward 3D高斯溅射)
[07:53] 🛠 SWE-bench Goes Live!(SWE-bench-Live:一个实时更新的问题解决基准评测)
[08:36] 💡 Multi-Domain Explainability of Preferences(偏好的多领域可解释性)
[09:16] 🤖 UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning(UniRL:基于监督学习和强化学习的自提升统一多模态模型)
[10:01] 🗣 FAMA: The First Large-Scale Open-Science Speech Foundation Model for English and Italian(FAMA:首个面向英语和意大利语的大规模开放科学语音基础模型)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递