
Sign up to save your podcasts
Or


本期的 15 篇论文如下:
[00:24] 🎬 Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance(Wan-Move:通过潜在轨迹引导实现运动可控的视频生成)
[00:55] 🚀 Visionary: The World Model Carrier Built on WebGPU-Powered Gaussian Splatting Platform(Visionary:基于WebGPU驱动的高斯溅射平台的世界模型载体)
[01:32] 🎬 Preserving Source Video Realism: High-Fidelity Face Swapping for Cinematic Quality(保持源视频真实感:面向电影级质量的高保真人脸交换)
[02:13] 🎬 OneStory: Coherent Multi-Shot Video Generation with Adaptive Memory(OneStory:基于自适应记忆的连贯多镜头视频生成)
[02:49] ⚡ ThreadWeaver: Adaptive Threading for Efficient Parallel Reasoning in Language Models(ThreadWeaver:面向语言模型高效并行推理的自适应线程技术)
[03:45] 🤖 MIND-V: Hierarchical Video Generation for Long-Horizon Robotic Manipulation with RL-based Physical Alignment(MIND-V:基于强化学习物理对齐的长时程机器人操作分层视频生成)
[04:47] 🚀 Boosting Unsupervised Video Instance Segmentation with Automatic Quality-Guided Self-Training(通过自动质量引导自训练提升无监督视频实例分割)
[05:18] 🌲 TreeGRPO: Tree-Advantage GRPO for Online RL Post-Training of Diffusion Models(TreeGRPO:基于树优势的GRPO用于扩散模型的在线强化学习后训练)
[05:55] 🚀 From Next-Token to Next-Block: A Principled Adaptation Path for Diffusion LLMs(从下一个词到下一个块:扩散语言模型的原则性适应路径)
[06:30] 📊 EcomBench: Towards Holistic Evaluation of Foundation Agents in E-commerce(EcomBench:面向电子商务领域基础智能体的全面评估)
[07:02] 🧩 Modular Neural Image Signal Processing(模块化神经图像信号处理)
[07:33] 🧭 Ground Slow, Move Fast: A Dual-System Foundation Model for Generalizable Vision-and-Language Navigation(慢思考,快行动:用于通用视觉语言导航的双系统基础模型)
[08:16] 🤖 DeepCode: Open Agentic Coding(DeepCode:开放式智能体编码)
[08:48] 🎯 TrackingWorld: World-centric Monocular 3D Tracking of Almost All Pixels(TrackingWorld:以世界为中心的几乎所有像素单目三维跟踪)
[09:30] 🎬 Efficiently Reconstructing Dynamic Scenes One D4RT at a Time(高效动态场景重建:一次一个D4RT)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
By duan5
22 ratings
本期的 15 篇论文如下:
[00:24] 🎬 Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance(Wan-Move:通过潜在轨迹引导实现运动可控的视频生成)
[00:55] 🚀 Visionary: The World Model Carrier Built on WebGPU-Powered Gaussian Splatting Platform(Visionary:基于WebGPU驱动的高斯溅射平台的世界模型载体)
[01:32] 🎬 Preserving Source Video Realism: High-Fidelity Face Swapping for Cinematic Quality(保持源视频真实感:面向电影级质量的高保真人脸交换)
[02:13] 🎬 OneStory: Coherent Multi-Shot Video Generation with Adaptive Memory(OneStory:基于自适应记忆的连贯多镜头视频生成)
[02:49] ⚡ ThreadWeaver: Adaptive Threading for Efficient Parallel Reasoning in Language Models(ThreadWeaver:面向语言模型高效并行推理的自适应线程技术)
[03:45] 🤖 MIND-V: Hierarchical Video Generation for Long-Horizon Robotic Manipulation with RL-based Physical Alignment(MIND-V:基于强化学习物理对齐的长时程机器人操作分层视频生成)
[04:47] 🚀 Boosting Unsupervised Video Instance Segmentation with Automatic Quality-Guided Self-Training(通过自动质量引导自训练提升无监督视频实例分割)
[05:18] 🌲 TreeGRPO: Tree-Advantage GRPO for Online RL Post-Training of Diffusion Models(TreeGRPO:基于树优势的GRPO用于扩散模型的在线强化学习后训练)
[05:55] 🚀 From Next-Token to Next-Block: A Principled Adaptation Path for Diffusion LLMs(从下一个词到下一个块:扩散语言模型的原则性适应路径)
[06:30] 📊 EcomBench: Towards Holistic Evaluation of Foundation Agents in E-commerce(EcomBench:面向电子商务领域基础智能体的全面评估)
[07:02] 🧩 Modular Neural Image Signal Processing(模块化神经图像信号处理)
[07:33] 🧭 Ground Slow, Move Fast: A Dual-System Foundation Model for Generalizable Vision-and-Language Navigation(慢思考,快行动:用于通用视觉语言导航的双系统基础模型)
[08:16] 🤖 DeepCode: Open Agentic Coding(DeepCode:开放式智能体编码)
[08:48] 🎯 TrackingWorld: World-centric Monocular 3D Tracking of Almost All Pixels(TrackingWorld:以世界为中心的几乎所有像素单目三维跟踪)
[09:30] 🎬 Efficiently Reconstructing Dynamic Scenes One D4RT at a Time(高效动态场景重建:一次一个D4RT)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递

56 Listeners

291 Listeners

295 Listeners

156 Listeners

135 Listeners

7 Listeners

1 Listeners

0 Listeners