
Sign up to save your podcasts
Or


本期的 15 篇论文如下:
[00:20] 🧠 Memory in the Age of AI Agents(人工智能代理时代下的记忆)
[00:57] 🚀 Towards Scalable Pre-training of Visual Tokenizers for Generation(迈向可扩展的视觉分词器预训练用于生成任务)
[01:42] 🎬 LongVie 2: Multimodal Controllable Ultra-Long Video World Model(LongVie 2:多模态可控超长视频世界模型)
[02:41] ⚡ ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding(ReFusion:一种具有并行自回归解码能力的扩散大语言模型)
[03:11] 🧪 NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents(NL2Repo-Bench:面向编码智能体长周期仓库生成能力的评估)
[03:53] ⚡ Error-Free Linear Attention is a Free Lunch: Exact Solution from Continuous-Time Dynamics(无误差线性注意力是免费午餐:基于连续时间动力学的精确解)
[04:29] 🎬 KlingAvatar 2.0 Technical Report(KlingAvatar 2.0 技术报告)
[05:17] 🧠 QwenLong-L1.5: Post-Training Recipe for Long-Context Reasoning and Memory Management(QwenLong-L1.5:实现长上下文推理与记忆管理的后训练方法)
[05:57] 🧠 MentraSuite: Post-Training Large Language Models for Mental Health Reasoning and Assessment(MentraSuite:用于心理健康推理与评估的大型语言模型后训练)
[06:35] 🤖 Openpi Comet: Competition Solution For 2025 BEHAVIOR Challenge(Openpi Comet:2025 BEHAVIOR挑战赛竞赛解决方案)
[07:14] 🤖 Spatial-Aware VLA Pretraining through Visual-Physical Alignment from Human Videos(通过人类视频中的视觉-物理对齐实现空间感知的VLA预训练)
[07:46] 🔍 V-REX: Benchmarking Exploratory Visual Reasoning via Chain-of-Questions(V-REX:基于问题链的探索性视觉推理基准测试)
[08:30] 👁 Toward Ambulatory Vision: Learning Visually-Grounded Active View Selection(迈向动态视觉:学习基于视觉的主动视角选择)
[09:14] 🌳 WebOperator: Action-Aware Tree Search for Autonomous Agents in Web Environment(WebOperator:面向Web环境中自主智能体的动作感知树搜索方法)
[09:58] 🛡 VLSA: Vision-Language-Action Models with Plug-and-Play Safety Constraint Layer(VLSA:具有即插即用安全约束层的视觉-语言-动作模型)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
By duan5
22 ratings
本期的 15 篇论文如下:
[00:20] 🧠 Memory in the Age of AI Agents(人工智能代理时代下的记忆)
[00:57] 🚀 Towards Scalable Pre-training of Visual Tokenizers for Generation(迈向可扩展的视觉分词器预训练用于生成任务)
[01:42] 🎬 LongVie 2: Multimodal Controllable Ultra-Long Video World Model(LongVie 2:多模态可控超长视频世界模型)
[02:41] ⚡ ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding(ReFusion:一种具有并行自回归解码能力的扩散大语言模型)
[03:11] 🧪 NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents(NL2Repo-Bench:面向编码智能体长周期仓库生成能力的评估)
[03:53] ⚡ Error-Free Linear Attention is a Free Lunch: Exact Solution from Continuous-Time Dynamics(无误差线性注意力是免费午餐:基于连续时间动力学的精确解)
[04:29] 🎬 KlingAvatar 2.0 Technical Report(KlingAvatar 2.0 技术报告)
[05:17] 🧠 QwenLong-L1.5: Post-Training Recipe for Long-Context Reasoning and Memory Management(QwenLong-L1.5:实现长上下文推理与记忆管理的后训练方法)
[05:57] 🧠 MentraSuite: Post-Training Large Language Models for Mental Health Reasoning and Assessment(MentraSuite:用于心理健康推理与评估的大型语言模型后训练)
[06:35] 🤖 Openpi Comet: Competition Solution For 2025 BEHAVIOR Challenge(Openpi Comet:2025 BEHAVIOR挑战赛竞赛解决方案)
[07:14] 🤖 Spatial-Aware VLA Pretraining through Visual-Physical Alignment from Human Videos(通过人类视频中的视觉-物理对齐实现空间感知的VLA预训练)
[07:46] 🔍 V-REX: Benchmarking Exploratory Visual Reasoning via Chain-of-Questions(V-REX:基于问题链的探索性视觉推理基准测试)
[08:30] 👁 Toward Ambulatory Vision: Learning Visually-Grounded Active View Selection(迈向动态视觉:学习基于视觉的主动视角选择)
[09:14] 🌳 WebOperator: Action-Aware Tree Search for Autonomous Agents in Web Environment(WebOperator:面向Web环境中自主智能体的动作感知树搜索方法)
[09:58] 🛡 VLSA: Vision-Language-Action Models with Plug-and-Play Safety Constraint Layer(VLSA:具有即插即用安全约束层的视觉-语言-动作模型)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递

56 Listeners

291 Listeners

294 Listeners

156 Listeners

135 Listeners

7 Listeners

1 Listeners

0 Listeners