
Sign up to save your podcasts
Or


本期的 15 篇论文如下:
[00:26] 🚀 Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer(Z-Image:基于单流扩散Transformer的高效图像生成基础模型)
[01:00] 🤔 REASONEDIT: Towards Reasoning-Enhanced Image Editing Models(REASONEDIT:迈向推理增强的图像编辑模型)
[01:25] 🎬 AnyTalker: Scaling Multi-Person Talking Video Generation with Interactivity Refinement(AnyTalker:通过交互性精炼实现可扩展的多人物对话视频生成)
[01:59] 🌉 Vision Bridge Transformer at Scale(大规模视觉桥接变换器)
[02:35] 🔍 Architecture Decoupling Is Not All You Need For Unified Multimodal Model(架构解耦并非统一多模态模型的全部所需)
[03:23] ⚡ DiP: Taming Diffusion Models in Pixel Space(DiP:在像素空间驾驭扩散模型)
[03:49] 🧠 Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models(每个令牌都重要:在大型语言模型中泛化1600万超长上下文)
[04:19] 🤖 DualVLA: Building a Generalizable Embodied Agent via Partial Decoupling of Reasoning and Action(DualVLA:通过部分解耦推理与动作构建可泛化的具身智能体)
[05:02] ⚡ Adversarial Flow Models(对抗性流模型)
[05:29] 🔬 Decoupled DMD: CFG Augmentation as the Spear, Distribution Matching as the Shield(解耦的DMD:CFG增强为矛,分布匹配为盾)
[06:10] 🎥 Captain Safari: A World Engine(Captain Safari:一种世界引擎)
[06:43] 🌍 World in a Frame: Understanding Culture Mixing as a New Challenge for Vision-Language Models(框架中的世界:理解文化混合作为视觉语言模型的新挑战)
[07:20] 🔍 The Collapse of Patches(图像块坍缩)
[07:50] 🔍 RefineBench: Evaluating Refinement Capability of Language Models via Checklists(RefineBench:基于检查表评估语言模型精炼能力)
[08:23] 🦷 OralGPT-Omni: A Versatile Dental Multimodal Large Language Model(OralGPT-Omni:一个通用的牙科多模态大语言模型)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
By duan5
22 ratings
本期的 15 篇论文如下:
[00:26] 🚀 Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer(Z-Image:基于单流扩散Transformer的高效图像生成基础模型)
[01:00] 🤔 REASONEDIT: Towards Reasoning-Enhanced Image Editing Models(REASONEDIT:迈向推理增强的图像编辑模型)
[01:25] 🎬 AnyTalker: Scaling Multi-Person Talking Video Generation with Interactivity Refinement(AnyTalker:通过交互性精炼实现可扩展的多人物对话视频生成)
[01:59] 🌉 Vision Bridge Transformer at Scale(大规模视觉桥接变换器)
[02:35] 🔍 Architecture Decoupling Is Not All You Need For Unified Multimodal Model(架构解耦并非统一多模态模型的全部所需)
[03:23] ⚡ DiP: Taming Diffusion Models in Pixel Space(DiP:在像素空间驾驭扩散模型)
[03:49] 🧠 Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models(每个令牌都重要:在大型语言模型中泛化1600万超长上下文)
[04:19] 🤖 DualVLA: Building a Generalizable Embodied Agent via Partial Decoupling of Reasoning and Action(DualVLA:通过部分解耦推理与动作构建可泛化的具身智能体)
[05:02] ⚡ Adversarial Flow Models(对抗性流模型)
[05:29] 🔬 Decoupled DMD: CFG Augmentation as the Spear, Distribution Matching as the Shield(解耦的DMD:CFG增强为矛,分布匹配为盾)
[06:10] 🎥 Captain Safari: A World Engine(Captain Safari:一种世界引擎)
[06:43] 🌍 World in a Frame: Understanding Culture Mixing as a New Challenge for Vision-Language Models(框架中的世界:理解文化混合作为视觉语言模型的新挑战)
[07:20] 🔍 The Collapse of Patches(图像块坍缩)
[07:50] 🔍 RefineBench: Evaluating Refinement Capability of Language Models via Checklists(RefineBench:基于检查表评估语言模型精炼能力)
[08:23] 🦷 OralGPT-Omni: A Versatile Dental Multimodal Large Language Model(OralGPT-Omni:一个通用的牙科多模态大语言模型)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递

56 Listeners

291 Listeners

294 Listeners

156 Listeners

135 Listeners

7 Listeners

1 Listeners

0 Listeners