HuggingFace 每日AI论文速递

2024.11.28 每日AI论文 | 实例控制增强,4D场景生成突破


Listen Later

本期的 21 篇论文如下:

[00:24] 🖼 ROICtrl: Boosting Instance Control for Visual Generation(ROICtrl:提升视觉生成的实例控制)

[01:08] 🎥 CAT4D: Create Anything in 4D with Multi-View Video Diffusion Models(CAT4D:使用多视角视频扩散模型在4D中创建任何内容)

[01:55] 📚 Interleaved Scene Graph for Interleaved Text-and-Image Generation Assessment(交错场景图用于交错文本与图像生成评估)

[02:38] 🌐 MARVEL-40M+: Multi-Level Visual Elaboration for High-Fidelity Text-to-3D Content Creation(MARVEL-40M+:高保真文本到3D内容创建的多层次视觉细化)

[03:21] 🤖 Large Language Model-Brained GUI Agents: A Survey(大语言模型驱动的图形用户界面代理:综述)

[03:57] 🎨 DreamCache: Finetuning-Free Lightweight Personalized Image Generation via Feature Caching(DreamCache:通过特征缓存实现无需微调的轻量级个性化图像生成)

[04:35] ⚡ Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient(协同解码使视觉自回归建模更高效)

[05:14] 🎥 Identity-Preserving Text-to-Video Generation by Frequency Decomposition(基于频率分解的身份保持文本到视频生成)

[05:47] 🚗 DiffusionDrive: Truncated Diffusion Model for End-to-End Autonomous Driving(扩散驱动:用于端到端自动驾驶的截断扩散模型)

[06:31] 🔺 3D Convex Splatting: Radiance Field Rendering with 3D Smooth Convexes(三维凸包拼接:基于三维平滑凸包的辐射场渲染)

[07:10] 🎭 Make-It-Animatable: An Efficient Framework for Authoring Animation-Ready 3D Characters(制作可动画化:一种高效的3D角色动画制作框架)

[07:48] 🎛 Omegance: A Single Parameter for Various Granularities in Diffusion-Based Synthesis(Omegance:扩散合成中多粒度控制的单一参数)

[08:26] 🦖 ChatRex: Taming Multimodal LLM for Joint Perception and Understanding(ChatRex:驯服多模态大语言模型以实现联合感知与理解)

[09:26] 🧍 UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing(UniPose:一种统一的多模态人体姿态理解、生成和编辑框架)

[10:06] 🧠 Optimizing Brain Tumor Segmentation with MedNeXt: BraTS 2024 SSA and Pediatrics(优化脑肿瘤分割与MedNeXt:BraTS 2024 SSA与儿科研究)

[10:43] ⏱ Draft Model Knows When to Stop: A Self-Verification Length Policy for Speculative Decoding(草稿模型知道何时停止:一种用于推测解码的自验证长度策略)

[11:27] 🎙 VideoLLM Knows When to Speak: Enhancing Time-Sensitive Video Comprehension with Video-Text Duet Interaction Format(视频大语言模型何时发言:通过视频-文本二重奏交互格式增强时间敏感视频理解)

[12:03] 🌟 Adaptive Blind All-in-One Image Restoration(自适应盲全合一图像恢复)

[12:39] 🛡 Edit Away and My Face Will not Stay: Personal Biometric Defense against Malicious Generative Editing(编辑与我的脸将不再保持:针对恶意生成编辑的个人生物识别防御)

[13:18] 🎥 Video-Guided Foley Sound Generation with Multimodal Controls(基于多模态控制的音效生成)

[13:48] 📚 Training and Evaluating Language Models with Template-based Data Generation(基于模板的数据生成训练与评估语言模型)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递

...more
View all episodesView all episodes
Download on the App Store

HuggingFace 每日AI论文速递By duan

  • 5
  • 5
  • 5
  • 5
  • 5

5

2 ratings


More shows like HuggingFace 每日AI论文速递

View all
硅谷101|中国版 by 泓君Jane

硅谷101|中国版

56 Listeners

商业就是这样 by 商业就是这样

商业就是这样

291 Listeners

声动早咖啡 by 声动活泼

声动早咖啡

294 Listeners

思文,败类 by 思文败类

思文,败类

157 Listeners

不开玩笑 Jokes Aside by 不开玩笑JokesAside

不开玩笑 Jokes Aside

136 Listeners

人民公园说AI by JustSayAI

人民公园说AI

7 Listeners

數創實驗室 - AI時代的學習指南 by Vincent在數創

數創實驗室 - AI時代的學習指南

1 Listeners

AI可可AI生活 by fly51fly

AI可可AI生活

0 Listeners