November 27, 2024

2024.11.27 每日AI论文 | ShowUI提升GUI效率，F2F改进图像编辑。

12 minutes

本期的 18 篇论文如下：

[00:28] 🖥 ShowUI: One Vision-Language-Action Model for GUI Visual Agent（ShowUI：一种用于GUI视觉代理的视觉-语言-动作模型）

[01:08] 🎥 Pathways on the Image Manifold: Image Editing via Video Generation（图像流形上的路径：通过视频生成进行图像编辑）

[01:45] ⭐ Star Attention: Efficient LLM Inference over Long Sequences（星型注意力：长序列上高效的大型语言模型推理）

[02:24] ⚡ Rethinking Token Reduction in MLLMs: Towards a Unified Paradigm for Training-Free Acceleration（重新思考MLLMs中的Token减少：迈向无训练加速的统一范式）

[03:01] 📊 MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs（MME-Survey: 多模态大语言模型评估的综合调查）

[03:44] 🎨 TEXGen: a Generative Diffusion Model for Mesh Textures（TEXGen：一种用于网格纹理的生成扩散模型）

[04:27] 🎨 SketchAgent: Language-Driven Sequential Sketch Generation（SketchAgent：语言驱动的顺序草图生成）

[05:11] 🔄 Learning 3D Representations from Procedural 3D Programs（从程序化3D程序中学习3D表示）

[05:55] 🧠 VLRewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models（VLRewardBench：视觉语言生成奖励模型的挑战性基准）

[06:50] 🔄 SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE（SAR3D：通过多尺度3D VQVAE实现自回归3D物体生成与理解）

[07:27] 🖼 FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity（精细标题：聚焦任意粒度的组合图像描述）

[08:09] 🎨 DreamMix: Decoupling Object Attributes for Enhanced Editability in Customized Image Inpainting（DreamMix：解耦对象属性以增强定制化图像修复的可编辑性）

[08:41] 📹 SALOVA: Segment-Augmented Long Video Assistant for Targeted Retrieval and Routing in Long-Form Video Analysis（SALOVA：长视频助手在长视频分析中的目标检索与路由）

[09:19] 📉 Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens（低比特量化有利于未充分训练的大型语言模型：基于100万亿训练标记的量化大型语言模型缩放规律）

[10:05] 🧬 MolReFlect: Towards In-Context Fine-grained Alignments between Molecules and Texts（MolReFlect：面向分子与文本之间细粒度对齐的研究）

[10:40] 👕 Controllable Human Image Generation with Personalized Multi-Garments（个性化多服装的可控人体图像生成）

[11:12] 🤖 Visual Counter Turing Test (VCT^2): Discovering the Challenges for AI-Generated Image Detection and Introducing Visual AI Index (V_AI)（视觉反图灵测试（VCT²）：发现AI生成图像检测的挑战并引入视觉AI指数（V_AI））

[11:55] 🎥 AnchorCrafter: Animate CyberAnchors Saling Your Products via Human-Object Interacting Video Generation（锚点创作者：通过人-物交互视频生成动画网络锚点推广产品）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递

...more

View all episodes

By duan

22 ratings