
Sign up to save your podcasts
Or


本期的 14 篇论文如下:
[00:26] 🎬 Kling-Omni Technical Report(Kling-Omni技术报告)
[01:02] 🚀 LLaDA2.0: Scaling Up Diffusion Language Models to 100B(LLaDA2.0:将扩散语言模型扩展至1000亿参数)
[01:41] 🔮 Next-Embedding Prediction Makes Strong Vision Learners(下一嵌入预测构建强大的视觉学习器)
[02:27] 👓 StereoPilot: Learning Unified and Efficient Stereo Conversion via Generative Priors(StereoPilot:通过生成先验学习统一且高效的立体转换)
[02:58] 🎬 Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model(Seedance 1.5 pro:一个原生音视频联合生成基础模型)
[03:34] 🔭 Depth Any Panoramas: A Foundation Model for Panoramic Depth Estimation(全景深度估计基础模型:深度任意全景)
[04:11] 📸 Generative Refocusing: Flexible Defocus Control from a Single Image(生成式重聚焦:从单张图像实现灵活散焦控制)
[04:56] 🤖 Adaptation of Agentic AI(智能体人工智能的适应性研究)
[05:36] ⚗ Alchemist: Unlocking Efficiency in Text-to-Image Model Training via Meta-Gradient Data Selection(炼金术士:通过元梯度数据选择提升文本到图像模型训练效率)
[06:12] 🛡 DeContext as Defense: Safe Image Editing in Diffusion Transformers(以去上下文为防御:扩散变换器中的安全图像编辑)
[06:58] 🧭 N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models(N3D-VLM:原生3D基础实现视觉语言模型中的精确空间推理)
[07:49] 🎨 The World is Your Canvas: Painting Promptable Events with Reference Images, Trajectories, and Text(世界即画布:用参考图像、轨迹和文本绘制可提示事件)
[08:30] 🔧 AdaTooler-V: Adaptive Tool-Use for Images and Videos(AdaTooler-V:面向图像与视频的自适应工具使用)
[09:19] 🤔 Exploration v.s. Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward(探索与利用之辩:通过裁剪、熵与虚假奖励重新审视RLVR)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
By duan5
22 ratings
本期的 14 篇论文如下:
[00:26] 🎬 Kling-Omni Technical Report(Kling-Omni技术报告)
[01:02] 🚀 LLaDA2.0: Scaling Up Diffusion Language Models to 100B(LLaDA2.0:将扩散语言模型扩展至1000亿参数)
[01:41] 🔮 Next-Embedding Prediction Makes Strong Vision Learners(下一嵌入预测构建强大的视觉学习器)
[02:27] 👓 StereoPilot: Learning Unified and Efficient Stereo Conversion via Generative Priors(StereoPilot:通过生成先验学习统一且高效的立体转换)
[02:58] 🎬 Seedance 1.5 pro: A Native Audio-Visual Joint Generation Foundation Model(Seedance 1.5 pro:一个原生音视频联合生成基础模型)
[03:34] 🔭 Depth Any Panoramas: A Foundation Model for Panoramic Depth Estimation(全景深度估计基础模型:深度任意全景)
[04:11] 📸 Generative Refocusing: Flexible Defocus Control from a Single Image(生成式重聚焦:从单张图像实现灵活散焦控制)
[04:56] 🤖 Adaptation of Agentic AI(智能体人工智能的适应性研究)
[05:36] ⚗ Alchemist: Unlocking Efficiency in Text-to-Image Model Training via Meta-Gradient Data Selection(炼金术士:通过元梯度数据选择提升文本到图像模型训练效率)
[06:12] 🛡 DeContext as Defense: Safe Image Editing in Diffusion Transformers(以去上下文为防御:扩散变换器中的安全图像编辑)
[06:58] 🧭 N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models(N3D-VLM:原生3D基础实现视觉语言模型中的精确空间推理)
[07:49] 🎨 The World is Your Canvas: Painting Promptable Events with Reference Images, Trajectories, and Text(世界即画布:用参考图像、轨迹和文本绘制可提示事件)
[08:30] 🔧 AdaTooler-V: Adaptive Tool-Use for Images and Videos(AdaTooler-V:面向图像与视频的自适应工具使用)
[09:19] 🤔 Exploration v.s. Exploitation: Rethinking RLVR through Clipping, Entropy, and Spurious Reward(探索与利用之辩:通过裁剪、熵与虚假奖励重新审视RLVR)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递

56 Listeners

291 Listeners

295 Listeners

156 Listeners

135 Listeners

7 Listeners

1 Listeners

0 Listeners