
Sign up to save your podcasts
Or
本期的 11 篇论文如下:
[00:24] 🚀 REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models(REINFORCE++:一种简单高效的大语言模型对齐方法)
[01:00] 🎥 MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models(MotionBench:用于评估和改进视觉语言模型细粒度视频运动理解的基准)
[01:40] 🔍 Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos(Sa2VA:将SAM2与LLaVA结合以实现图像和视频的密集基础理解)
[02:21] 🌍 Cosmos World Foundation Model Platform for Physical AI(物理AI的宇宙世界基础模型平台)
[03:01] 🔍 LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token(LLaVA-Mini:使用单一视觉标记的高效图像与视频大型多模态模型)
[03:40] 🎥 Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control(扩散作为着色器:支持多样化视频生成控制的3D感知视频扩散)
[04:22] 🎥 MoDec-GS: Global-to-Local Motion Decomposition and Temporal Interval Adjustment for Compact Dynamic 3D Gaussian Splatting(MoDec-GS:全局到局部运动分解与时间间隔调整用于紧凑动态3D高斯泼溅)
[05:05] 📊 PPTAgent: Generating and Evaluating Presentations Beyond Text-to-Slides(PPTAgent:超越文本到幻灯片的演示文稿生成与评估)
[05:42] 🎭 MagicFace: High-Fidelity Facial Expression Editing with Action-Unit Control(MagicFace:基于动作单元控制的高保真面部表情编辑)
[06:17] 🎥 Magic Mirror: ID-Preserved Video Generation in Video Diffusion Transformers(魔镜:基于视频扩散变换器的身份保持视频生成)
[06:52] 🐬 Dolphin: Closed-loop Open-ended Auto-research through Thinking, Practice, and Feedback(海豚:通过思考、实践和反馈实现闭环开放式自动研究)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
本期的 11 篇论文如下:
[00:24] 🚀 REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models(REINFORCE++:一种简单高效的大语言模型对齐方法)
[01:00] 🎥 MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models(MotionBench:用于评估和改进视觉语言模型细粒度视频运动理解的基准)
[01:40] 🔍 Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos(Sa2VA:将SAM2与LLaVA结合以实现图像和视频的密集基础理解)
[02:21] 🌍 Cosmos World Foundation Model Platform for Physical AI(物理AI的宇宙世界基础模型平台)
[03:01] 🔍 LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token(LLaVA-Mini:使用单一视觉标记的高效图像与视频大型多模态模型)
[03:40] 🎥 Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control(扩散作为着色器:支持多样化视频生成控制的3D感知视频扩散)
[04:22] 🎥 MoDec-GS: Global-to-Local Motion Decomposition and Temporal Interval Adjustment for Compact Dynamic 3D Gaussian Splatting(MoDec-GS:全局到局部运动分解与时间间隔调整用于紧凑动态3D高斯泼溅)
[05:05] 📊 PPTAgent: Generating and Evaluating Presentations Beyond Text-to-Slides(PPTAgent:超越文本到幻灯片的演示文稿生成与评估)
[05:42] 🎭 MagicFace: High-Fidelity Facial Expression Editing with Action-Unit Control(MagicFace:基于动作单元控制的高保真面部表情编辑)
[06:17] 🎥 Magic Mirror: ID-Preserved Video Generation in Video Diffusion Transformers(魔镜:基于视频扩散变换器的身份保持视频生成)
[06:52] 🐬 Dolphin: Closed-loop Open-ended Auto-research through Thinking, Practice, and Feedback(海豚:通过思考、实践和反馈实现闭环开放式自动研究)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递