December 01, 2025

2025.12.01 | Z-Image小参高效夺冠；REASONEDIT先思后画登顶

9 minutes

本期的 15 篇论文如下：

[00:26] 🚀 Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer（Z-Image：基于单流扩散Transformer的高效图像生成基础模型）

[01:00] 🤔 REASONEDIT: Towards Reasoning-Enhanced Image Editing Models（REASONEDIT：迈向推理增强的图像编辑模型）

[01:25] 🎬 AnyTalker: Scaling Multi-Person Talking Video Generation with Interactivity Refinement（AnyTalker：通过交互性精炼实现可扩展的多人物对话视频生成）

[01:59] 🌉 Vision Bridge Transformer at Scale（大规模视觉桥接变换器）

[02:35] 🔍 Architecture Decoupling Is Not All You Need For Unified Multimodal Model（架构解耦并非统一多模态模型的全部所需）

[03:23] ⚡ DiP: Taming Diffusion Models in Pixel Space（DiP：在像素空间驾驭扩散模型）

[03:49] 🧠 Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models（每个令牌都重要：在大型语言模型中泛化1600万超长上下文）

[04:19] 🤖 DualVLA: Building a Generalizable Embodied Agent via Partial Decoupling of Reasoning and Action（DualVLA：通过部分解耦推理与动作构建可泛化的具身智能体）

[05:02] ⚡ Adversarial Flow Models（对抗性流模型）

[05:29] 🔬 Decoupled DMD: CFG Augmentation as the Spear, Distribution Matching as the Shield（解耦的DMD：CFG增强为矛，分布匹配为盾）

[06:10] 🎥 Captain Safari: A World Engine（Captain Safari：一种世界引擎）

[06:43] 🌍 World in a Frame: Understanding Culture Mixing as a New Challenge for Vision-Language Models（框架中的世界：理解文化混合作为视觉语言模型的新挑战）

[07:20] 🔍 The Collapse of Patches（图像块坍缩）

[07:50] 🔍 RefineBench: Evaluating Refinement Capability of Language Models via Checklists（RefineBench：基于检查表评估语言模型精炼能力）

[08:23] 🦷 OralGPT-Omni: A Versatile Dental Multimodal Large Language Model（OralGPT-Omni：一个通用的牙科多模态大语言模型）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递

...more

View all episodes

By duan

22 ratings

December 01, 2025

2025.12.01 | Z-Image小参高效夺冠；REASONEDIT先思后画登顶

9 minutes

本期的 15 篇论文如下：

[00:26] 🚀 Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer（Z-Image：基于单流扩散Transformer的高效图像生成基础模型）

[01:00] 🤔 REASONEDIT: Towards Reasoning-Enhanced Image Editing Models（REASONEDIT：迈向推理增强的图像编辑模型）

[01:25] 🎬 AnyTalker: Scaling Multi-Person Talking Video Generation with Interactivity Refinement（AnyTalker：通过交互性精炼实现可扩展的多人物对话视频生成）

[01:59] 🌉 Vision Bridge Transformer at Scale（大规模视觉桥接变换器）

[02:35] 🔍 Architecture Decoupling Is Not All You Need For Unified Multimodal Model（架构解耦并非统一多模态模型的全部所需）

[03:23] ⚡ DiP: Taming Diffusion Models in Pixel Space（DiP：在像素空间驾驭扩散模型）

[03:49] 🧠 Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models（每个令牌都重要：在大型语言模型中泛化1600万超长上下文）

[04:19] 🤖 DualVLA: Building a Generalizable Embodied Agent via Partial Decoupling of Reasoning and Action（DualVLA：通过部分解耦推理与动作构建可泛化的具身智能体）

[05:02] ⚡ Adversarial Flow Models（对抗性流模型）

[05:29] 🔬 Decoupled DMD: CFG Augmentation as the Spear, Distribution Matching as the Shield（解耦的DMD：CFG增强为矛，分布匹配为盾）

[06:10] 🎥 Captain Safari: A World Engine（Captain Safari：一种世界引擎）

[06:43] 🌍 World in a Frame: Understanding Culture Mixing as a New Challenge for Vision-Language Models（框架中的世界：理解文化混合作为视觉语言模型的新挑战）

[07:20] 🔍 The Collapse of Patches（图像块坍缩）

[07:50] 🔍 RefineBench: Evaluating Refinement Capability of Language Models via Checklists（RefineBench：基于检查表评估语言模型精炼能力）

[08:23] 🦷 OralGPT-Omni: A Versatile Dental Multimodal Large Language Model（OralGPT-Omni：一个通用的牙科多模态大语言模型）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递

...more

More shows like HuggingFace 每日AI论文速递

View all

硅谷101|中国版

56 Listeners

商业就是这样

291 Listeners

声动早咖啡

294 Listeners

思文，败类

156 Listeners

不开玩笑 Jokes Aside

135 Listeners

人民公园说AI

7 Listeners

數創實驗室 - AI時代的學習指南

1 Listeners

AI可可AI生活

0 Listeners

Share 2025.12.01 | Z-Image小参高效夺冠；REASONEDIT先思后画登顶

Sign up to save your podcasts

2025.12.01 | Z-Image小参高效夺冠；REASONEDIT先思后画登顶

2025.12.01 | Z-Image小参高效夺冠；REASONEDIT先思后画登顶

More shows like HuggingFace 每日AI论文速递

硅谷101|中国版

商业就是这样

声动早咖啡

思文，败类

不开玩笑 Jokes Aside

人民公园说AI

數創實驗室 - AI時代的學習指南

AI可可AI生活