HuggingFace 每日AI论文速递

2024.12.03 每日AI论文 | X-Prompt提升图像生成,GATE OpenING评估图文生成。


Listen Later

本期的 24 篇论文如下:

[00:23] 🖼 X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models(X-Prompt:面向自回归视觉语言基础模型的通用上下文图像生成)

[00:58] 📊 GATE OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation(GATE 开放:一个综合基准用于评估开放式交错图文生成)

[01:32] 🖼 Switti: Designing Scale-Wise Transformers for Text-to-Image Synthesis(Switti:为文本到图像合成设计尺度变换器)

[02:09] 🎥 Open-Sora Plan: Open-Source Large Video Generation Model(开放Sora计划:开源大型视频生成模型)

[02:55] 🎥 TAPTRv3: Spatial and Temporal Context Foster Robust Tracking of Any Point in Long Video(TAPTRv3:时空上下文增强长视频中任意点的鲁棒跟踪)

[03:37] 🤖 o1-Coder: an o1 Replication for Coding(o1-Coder:一个面向编码任务的o1模型复现)

[04:12] 🤖 SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction with 3D Autonomous Characters(SOLAMI:沉浸式互动的3D自主角色社交视觉-语言-动作建模)

[04:49] 🎥 VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation(VISTA:通过视频时空增强提升长时和高分辨率视频理解)

[05:38] 🔍 TinyFusion: Diffusion Transformers Learned Shallow(微型融合:浅层扩散变换器的学习)

[06:19] 🔍 VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models(VLsI:从大型到小型视觉语言模型的层级交互化)

[06:52] 🎙 FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait(FLOAT:基于生成运动潜在流匹配的音频驱动说话人像)

[07:32] 🚀 Efficient Track Anything(高效追踪任何目标)

[08:15] 🌊 Steering Rectified Flow Models in the Vector Field for Controlled Image Generation(在矢量场中引导校正流模型以实现受控图像生成)

[08:50] 🎥 Long Video Diffusion Generation with Segmented Cross-Attention and Content-Rich Video Data Curation(长视频扩散生成与分段交叉注意力及内容丰富的视频数据集构建)

[09:33] 📹 WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model(WF-VAE:通过小波驱动的能量流动增强视频VAE以用于潜在视频扩散模型)

[10:11] 🔍 VLSBench: Unveiling Visual Leakage in Multimodal Safety(VLSBench:揭示多模态安全中的视觉泄露问题)

[10:51] 🧠 VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception of Geometric Information(VisOnlyQA:大型视觉语言模型在几何信息视觉感知方面仍存在困难)

[11:41] 🎮 PhysGame: Uncovering Physical Commonsense Violations in Gameplay Videos(PhysGame:揭示游戏视频中的物理常识违规)

[12:14] 🗣 Collaborative Instance Navigation: Leveraging Agent Self-Dialogue to Minimize User Input(协作实例导航:利用代理自我对话最小化用户输入)

[12:51] 🌍 INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge(评估多语言理解能力:基于区域知识)

[13:28] 🎨 Art-Free Generative Models: Art Creation Without Graphic Art Knowledge(无艺术生成模型:无需图形艺术知识的艺术创作)

[14:02] 📈 A Simple and Provable Scaling Law for the Test-Time Compute of Large Language Models(大型语言模型测试时计算的简单可证明缩放定律)

[14:41] 🌐 World-consistent Video Diffusion with Explicit 3D Modeling(世界一致性视频扩散与显式3D建模)

[15:22] 🔊 Towards Cross-Lingual Audio Abuse Detection in Low-Resource Settings with Few-Shot Learning(面向低资源环境下跨语言音频滥用检测的小样本学习)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递

...more
View all episodesView all episodes
Download on the App Store

HuggingFace 每日AI论文速递By duan

  • 5
  • 5
  • 5
  • 5
  • 5

5

2 ratings


More shows like HuggingFace 每日AI论文速递

View all
硅谷101|中国版 by 泓君Jane

硅谷101|中国版

56 Listeners

商业就是这样 by 商业就是这样

商业就是这样

292 Listeners

声动早咖啡 by 声动活泼

声动早咖啡

293 Listeners

思文,败类 by 思文败类

思文,败类

157 Listeners

不开玩笑 Jokes Aside by 不开玩笑JokesAside

不开玩笑 Jokes Aside

136 Listeners

人民公园说AI by JustSayAI

人民公园说AI

7 Listeners

數創實驗室 - AI時代的學習指南 by Vincent在數創

數創實驗室 - AI時代的學習指南

1 Listeners

AI可可AI生活 by fly51fly

AI可可AI生活

0 Listeners