December 15, 2025

2025.12.15 | 牙科小模型逆袭；扩散模型弃VAE

Listen Later

9 minutes

本期的 14 篇论文如下：

[00:22] 🦷 DentalGPT: Incentivizing Multimodal Complex Reasoning in Dentistry（DentalGPT：激励牙科领域多模态复杂推理）

[00:53] 🎨 SVG-T2I: Scaling Up Text-to-Image Latent Diffusion Model Without Variational Autoencoder（SVG-T2I：无需变分自编码器即可扩展文本到图像潜在扩散模型）

[01:41] 🎥 EgoX: Egocentric Video Generation from a Single Exocentric Video（EgoX：从单视角外中心视频生成自我中心视频）

[02:26] 🎬 V-RGBX: Video Editing with Accurate Controls over Intrinsic Properties（V-RGBX：基于内在属性精确控制的视频编辑）

[03:03] 🔍 Sliding Window Attention Adaptation（滑动窗口注意力适应）

[03:43] 🎬 PersonaLive! Expressive Portrait Image Animation for Live Streaming（PersonaLive！面向直播场景的富有表现力的肖像图像动画）

[04:10] 🎬 Structure From Tracking: Distilling Structure-Preserving Motion for Video Generation（基于跟踪的结构生成：为视频生成提炼结构保持的运动）

[04:41] 🎨 Exploring MLLM-Diffusion Information Transfer with MetaCanvas（探索MLLM-扩散信息传递与MetaCanvas）

[05:18] 🔄 MeshSplatting: Differentiable Rendering with Opaque Meshes（MeshSplatting：基于不透明网格的可微分渲染）

[06:02] 🤖 LEO-RobotAgent: A General-purpose Robotic Agent for Language-driven Embodied Operator（LEO-RobotAgent：一种用于语言驱动具身操作的通用机器人智能体）

[06:39] ⚡ The N-Body Problem: Parallel Execution from Single-Person Egocentric Video（N体问题：从单人第一人称视频中实现并行执行）

[07:11] 🧬 CheXmask-U: Quantifying uncertainty in landmark-based anatomical segmentation for X-ray images（CheXmask-U：X射线图像中基于解剖标志点分割的不确定性量化）

[07:52] 🏆 Task adaptation of Vision-Language-Action model: 1st Place Solution for the 2025 BEHAVIOR Challenge（视觉-语言-动作模型的任务适应：2025 BEHAVIOR挑战赛冠军方案）

[08:32] 🚀 Sharp Monocular View Synthesis in Less Than a Second（一秒钟内实现锐利的单目视图合成）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递

...more

View all episodes

View all episodes

Download on the App Store

Download on the App Store

Get it on Google Play

HuggingFace 每日AI论文速递

By duan

5

22 ratings

December 15, 2025

2025.12.15 | 牙科小模型逆袭；扩散模型弃VAE

Listen Later

9 minutes

本期的 14 篇论文如下：

[00:22] 🦷 DentalGPT: Incentivizing Multimodal Complex Reasoning in Dentistry（DentalGPT：激励牙科领域多模态复杂推理）

[00:53] 🎨 SVG-T2I: Scaling Up Text-to-Image Latent Diffusion Model Without Variational Autoencoder（SVG-T2I：无需变分自编码器即可扩展文本到图像潜在扩散模型）

[01:41] 🎥 EgoX: Egocentric Video Generation from a Single Exocentric Video（EgoX：从单视角外中心视频生成自我中心视频）

[02:26] 🎬 V-RGBX: Video Editing with Accurate Controls over Intrinsic Properties（V-RGBX：基于内在属性精确控制的视频编辑）

[03:03] 🔍 Sliding Window Attention Adaptation（滑动窗口注意力适应）

[03:43] 🎬 PersonaLive! Expressive Portrait Image Animation for Live Streaming（PersonaLive！面向直播场景的富有表现力的肖像图像动画）

[04:10] 🎬 Structure From Tracking: Distilling Structure-Preserving Motion for Video Generation（基于跟踪的结构生成：为视频生成提炼结构保持的运动）

[04:41] 🎨 Exploring MLLM-Diffusion Information Transfer with MetaCanvas（探索MLLM-扩散信息传递与MetaCanvas）

[05:18] 🔄 MeshSplatting: Differentiable Rendering with Opaque Meshes（MeshSplatting：基于不透明网格的可微分渲染）

[06:02] 🤖 LEO-RobotAgent: A General-purpose Robotic Agent for Language-driven Embodied Operator（LEO-RobotAgent：一种用于语言驱动具身操作的通用机器人智能体）

[06:39] ⚡ The N-Body Problem: Parallel Execution from Single-Person Egocentric Video（N体问题：从单人第一人称视频中实现并行执行）

[07:11] 🧬 CheXmask-U: Quantifying uncertainty in landmark-based anatomical segmentation for X-ray images（CheXmask-U：X射线图像中基于解剖标志点分割的不确定性量化）

[07:52] 🏆 Task adaptation of Vision-Language-Action model: 1st Place Solution for the 2025 BEHAVIOR Challenge（视觉-语言-动作模型的任务适应：2025 BEHAVIOR挑战赛冠军方案）

[08:32] 🚀 Sharp Monocular View Synthesis in Less Than a Second（一秒钟内实现锐利的单目视图合成）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递

...more

More shows like HuggingFace 每日AI论文速递

硅谷101|中国版 by 泓君Jane

硅谷101|中国版

56 Listeners

商业就是这样 by 商业就是这样

商业就是这样

291 Listeners

声动早咖啡 by 声动活泼

声动早咖啡

295 Listeners

思文，败类 by 思文败类

思文，败类

156 Listeners

不开玩笑 Jokes Aside by 不开玩笑JokesAside

不开玩笑 Jokes Aside

135 Listeners

人民公园说AI by JustSayAI

人民公园说AI

7 Listeners

數創實驗室 - AI時代的學習指南 by Vincent在數創

數創實驗室 - AI時代的學習指南

1 Listeners

AI可可AI生活 by fly51fly

AI可可AI生活

0 Listeners