
Sign up to save your podcasts
Or


本期的 14 篇论文如下:
[00:22] 🦷 DentalGPT: Incentivizing Multimodal Complex Reasoning in Dentistry(DentalGPT:激励牙科领域多模态复杂推理)
[00:53] 🎨 SVG-T2I: Scaling Up Text-to-Image Latent Diffusion Model Without Variational Autoencoder(SVG-T2I:无需变分自编码器即可扩展文本到图像潜在扩散模型)
[01:41] 🎥 EgoX: Egocentric Video Generation from a Single Exocentric Video(EgoX:从单视角外中心视频生成自我中心视频)
[02:26] 🎬 V-RGBX: Video Editing with Accurate Controls over Intrinsic Properties(V-RGBX:基于内在属性精确控制的视频编辑)
[03:03] 🔍 Sliding Window Attention Adaptation(滑动窗口注意力适应)
[03:43] 🎬 PersonaLive! Expressive Portrait Image Animation for Live Streaming(PersonaLive!面向直播场景的富有表现力的肖像图像动画)
[04:10] 🎬 Structure From Tracking: Distilling Structure-Preserving Motion for Video Generation(基于跟踪的结构生成:为视频生成提炼结构保持的运动)
[04:41] 🎨 Exploring MLLM-Diffusion Information Transfer with MetaCanvas(探索MLLM-扩散信息传递与MetaCanvas)
[05:18] 🔄 MeshSplatting: Differentiable Rendering with Opaque Meshes(MeshSplatting:基于不透明网格的可微分渲染)
[06:02] 🤖 LEO-RobotAgent: A General-purpose Robotic Agent for Language-driven Embodied Operator(LEO-RobotAgent:一种用于语言驱动具身操作的通用机器人智能体)
[06:39] ⚡ The N-Body Problem: Parallel Execution from Single-Person Egocentric Video(N体问题:从单人第一人称视频中实现并行执行)
[07:11] 🧬 CheXmask-U: Quantifying uncertainty in landmark-based anatomical segmentation for X-ray images(CheXmask-U:X射线图像中基于解剖标志点分割的不确定性量化)
[07:52] 🏆 Task adaptation of Vision-Language-Action model: 1st Place Solution for the 2025 BEHAVIOR Challenge(视觉-语言-动作模型的任务适应:2025 BEHAVIOR挑战赛冠军方案)
[08:32] 🚀 Sharp Monocular View Synthesis in Less Than a Second(一秒钟内实现锐利的单目视图合成)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
By duan5
22 ratings
本期的 14 篇论文如下:
[00:22] 🦷 DentalGPT: Incentivizing Multimodal Complex Reasoning in Dentistry(DentalGPT:激励牙科领域多模态复杂推理)
[00:53] 🎨 SVG-T2I: Scaling Up Text-to-Image Latent Diffusion Model Without Variational Autoencoder(SVG-T2I:无需变分自编码器即可扩展文本到图像潜在扩散模型)
[01:41] 🎥 EgoX: Egocentric Video Generation from a Single Exocentric Video(EgoX:从单视角外中心视频生成自我中心视频)
[02:26] 🎬 V-RGBX: Video Editing with Accurate Controls over Intrinsic Properties(V-RGBX:基于内在属性精确控制的视频编辑)
[03:03] 🔍 Sliding Window Attention Adaptation(滑动窗口注意力适应)
[03:43] 🎬 PersonaLive! Expressive Portrait Image Animation for Live Streaming(PersonaLive!面向直播场景的富有表现力的肖像图像动画)
[04:10] 🎬 Structure From Tracking: Distilling Structure-Preserving Motion for Video Generation(基于跟踪的结构生成:为视频生成提炼结构保持的运动)
[04:41] 🎨 Exploring MLLM-Diffusion Information Transfer with MetaCanvas(探索MLLM-扩散信息传递与MetaCanvas)
[05:18] 🔄 MeshSplatting: Differentiable Rendering with Opaque Meshes(MeshSplatting:基于不透明网格的可微分渲染)
[06:02] 🤖 LEO-RobotAgent: A General-purpose Robotic Agent for Language-driven Embodied Operator(LEO-RobotAgent:一种用于语言驱动具身操作的通用机器人智能体)
[06:39] ⚡ The N-Body Problem: Parallel Execution from Single-Person Egocentric Video(N体问题:从单人第一人称视频中实现并行执行)
[07:11] 🧬 CheXmask-U: Quantifying uncertainty in landmark-based anatomical segmentation for X-ray images(CheXmask-U:X射线图像中基于解剖标志点分割的不确定性量化)
[07:52] 🏆 Task adaptation of Vision-Language-Action model: 1st Place Solution for the 2025 BEHAVIOR Challenge(视觉-语言-动作模型的任务适应:2025 BEHAVIOR挑战赛冠军方案)
[08:32] 🚀 Sharp Monocular View Synthesis in Less Than a Second(一秒钟内实现锐利的单目视图合成)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递

56 Listeners

291 Listeners

295 Listeners

156 Listeners

135 Listeners

7 Listeners

1 Listeners

0 Listeners