本期的 15 篇论文如下:
[00:23] 🧪 ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows(ScienceBoard:评估现实科学工作流程中的多模态自主Agent)
[01:09] 🤔 MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs(MME-推理:多模态大型语言模型中逻辑推理的综合基准)
[01:51] 🖼 Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers(Paper2Poster:基于科研论文的多模态海报自动生成)
[02:28] 🎨 OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data(OmniConsistency:从配对风格化数据中学习与风格无关的一致性)
[03:06] 🎬 OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation(OpenS2V-Nexus:一个用于主题驱动视频生成的详细基准和百万级数据集)
[03:50] 🧠 SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond(SynLogic:大规模合成可验证推理数据,用于学习逻辑推理及其他能力)
[04:32] 💡 Exploring the Latent Capacity of LLMs for One-Step Text Generation(探索大型语言模型在一步文本生成中的潜在能力)
[05:13] 🧠 VerIPO: Cultivating Long Reasoning in Video-LLMs via Verifier-Gudied Iterative Policy Optimization(VerIPO:通过验证器引导的迭代策略优化,培养视频大型语言模型中的长期推理能力)
[05:48] 🤔 Don't Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning(别想太多:偏好更短的思维链以提升大型语言模型的推理能力)
[06:29] 🤔 MMMR: Benchmarking Massive Multi-Modal Reasoning Tasks(MMMR:大规模多模态推理任务的基准测试)
[07:09] 🤖 UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents(UI-Genie:一种迭代提升基于MLLM的移动GUI代理的自提升方法)
[07:52] 🎬 Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation(Sparse VideoGen2:通过语义感知置换和稀疏注意力加速视频生成)
[08:28] 📹 MME-VideoOCR: Evaluating OCR-Based Capabilities of Multimodal LLMs in Video Scenarios(MME-VideoOCR:评估多模态大型语言模型在视频场景中基于OCR的能力)
[09:16] 🧩 GraLoRA: Granular Low-Rank Adaptation for Parameter-Efficient Fine-Tuning(GraLoRA:用于参数高效微调的细粒度低秩适配)
[10:02] 🕵 Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?(Video-Holmes:多模态大语言模型能否像福尔摩斯一样进行复杂的视频推理?)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递