October 17, 2024

2024.10.17 每日AI论文 | 视觉推理能力待提升，自中心视频理解需改进

13 minutes

本期的 19 篇论文如下：

[00:28] 🧠 HumanEval-V: Evaluating Visual Understanding and Reasoning Abilities of Large Multimodal Models Through Coding Tasks（HumanEval-V：通过编码任务评估大型多模态模型的视觉理解和推理能力）

[01:15] 🎥 VidEgoThink: Assessing Egocentric Video Understanding Capabilities for Embodied AI（VidEgoThink：评估具身AI的自中心视频理解能力）

[01:50] 🧠 The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio（多模态的诅咒：评估大型多模态模型在语言、视觉和音频中的幻觉）

[02:31] 🤖 Revealing the Barriers of Language Agents in Planning（揭示语言代理在规划中的障碍）

[03:15] 📄 DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception（DocLayout-YOLO：通过多样合成数据和全局到局部自适应感知增强文档布局分析）

[03:56] ⚙ Large Language Model Evaluation via Matrix Nuclear-Norm（大型语言模型评估通过矩阵核范数）

[04:38] 🧬 Exploring Model Kinship for Merging Large Language Models（探索大型语言模型合并中的模型亲缘关系）

[05:15] 📊 ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs（ProSA：评估和理解大型语言模型的提示敏感性）

[05:50] ⚡ ZipVL: Efficient Large Vision-Language Models with Dynamic Token Sparsification and KV Cache Compression（ZipVL：动态令牌稀疏化和KV缓存压缩的高效大视觉-语言模型）

[06:31] 📄 Improving Long-Text Alignment for Text-to-Image Diffusion Models（改进文本到图像扩散模型的长文本对齐）

[07:11] 🔄 Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models（简化、稳定和扩展连续时间一致性模型）

[07:55] 🛡 Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements（可控安全对齐：推理时适应多样安全需求）

[08:34] 🔍 Tracking Universal Features Through Fine-Tuning and Model Merging（通过微调和模型合并追踪通用特征）

[09:08] 🔄 Insights from the Inverse: Reconstructing LLM Training Goals Through Inverse RL（逆向洞察：通过逆向强化学习重构LLM训练目标）

[09:46] 🧠 Neural Metamorphosis（神经变形）

[10:25] 🌍 WorldMedQA-V: a multilingual, multimodal medical examination dataset for multimodal language models evaluation（世界医学QA-V：多语言、多模态医学考试数据集用于多模态语言模型评估）

[11:09] 🌐 OMCAT: Omni Context Aware Transformer（全上下文感知变压器）

[11:44] ⏳ ChroKnowledge: Unveiling Chronological Knowledge of Language Models in Multiple Domains（ChroKnowledge：揭示语言模型在多领域中的时间知识）

[12:22] 📚 DyVo: Dynamic Vocabularies for Learned Sparse Retrieval with Entities（DyVo：动态词汇表用于实体学习的稀疏检索）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递

...more

View all episodes

By duan

22 ratings