【赞助商】
通勤路上就听AI每周谈。AI每周谈,每周带你回顾上周AI大事
传送门 🔗https://www.xiaoyuzhoufm.com/podcast/688a34636f5a275f1cba40fd
【目录】
本期的 15 篇论文如下:
[00:33] 🧬 SkillClaw: Let Skills Evolve Collectively with Agentic Evolver(SkillClaw:让技能在智能体演化器中集体进化)
[01:24] 🔢 When Numbers Speak: Aligning Textual Numerals and Visual Instances in Text-to-Video Diffusion Models(当数字说话:在文本到视频扩散模型中实现文本数字与视觉实例的对齐)
[02:22] 🎨 MegaStyle: Constructing Diverse and Scalable Style Dataset via Consistent Text-to-Image Style Mapping(MegaStyle:通过一致的文本到图像风格映射构建多样且可扩展的风格数据集)
[03:15] 🤖 HY-Embodied-0.5: Embodied Foundation Models for Real-World Agents(HY-Embodied-0.5:面向现实世界智能体的具身基础模型)
[04:07] 🧠 Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability(重新审视推理监督微调中的泛化问题:关于优化、数据与模型能力的条件性分析)
[04:52] 🤖 ClawBench: Can AI Agents Complete Everyday Online Tasks?(ClawBench:AI智能体能否完成日常在线任务?)
[05:31] 📱 KnowU-Bench: Towards Interactive, Proactive, and Personalized Mobile Agent Evaluation(KnowU-Bench:迈向交互式、主动式与个性化的移动代理评估)
[06:18] 🧠 Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering(LLM智能体中的外部化:对记忆、技能、协议与治理工程的一体化综述)
[07:09] 🎭 LPM 1.0: Video-based Character Performance Model(LPM 1.0:基于视频的角色表演模型)
[07:58] 🧠 OpenSpatial: A Principled Data Engine for Empowering Spatial Intelligence(OpenSpatial:一个赋能空间智能的原则性数据引擎)
[08:50] 🧠 Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models(明智行动:在智能多模态模型中培养元认知工具使用能力)
[09:41] ⚡ DMax: Aggressive Parallel Decoding for dLLMs(DMax:面向扩散语言模型的激进并行解码)
[10:20] 🧠 Graph of Skills: Dependency-Aware Structural Retrieval for Massive Agent Skills(技能图谱:面向海量智能体技能的依赖感知结构化检索方法)
[11:02] 🧩 OmniJigsaw: Enhancing Omni-Modal Reasoning via Modality-Orchestrated Reordering(OmniJigsaw:通过模态编排重排序增强全模态推理)
[11:41] 🧠 OpenVLThinkerV2: A Generalist Multimodal Reasoning Model for Multi-domain Visual Tasks(OpenVLThinkerV2:一个面向多领域视觉任务的通用多模态推理模型)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递