HuggingFace 每日AI论文速递

2024.10.17 每日AI论文 | 视觉推理能力待提升,自中心视频理解需改进


Listen Later

本期的 19 篇论文如下:

[00:28] 🧠 HumanEval-V: Evaluating Visual Understanding and Reasoning Abilities of Large Multimodal Models Through Coding Tasks(HumanEval-V:通过编码任务评估大型多模态模型的视觉理解和推理能力)

[01:15] 🎥 VidEgoThink: Assessing Egocentric Video Understanding Capabilities for Embodied AI(VidEgoThink:评估具身AI的自中心视频理解能力)

[01:50] 🧠 The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio(多模态的诅咒:评估大型多模态模型在语言、视觉和音频中的幻觉)

[02:31] 🤖 Revealing the Barriers of Language Agents in Planning(揭示语言代理在规划中的障碍)

[03:15] 📄 DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception(DocLayout-YOLO:通过多样合成数据和全局到局部自适应感知增强文档布局分析)

[03:56] ⚙ Large Language Model Evaluation via Matrix Nuclear-Norm(大型语言模型评估通过矩阵核范数)

[04:38] 🧬 Exploring Model Kinship for Merging Large Language Models(探索大型语言模型合并中的模型亲缘关系)

[05:15] 📊 ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs(ProSA:评估和理解大型语言模型的提示敏感性)

[05:50] ⚡ ZipVL: Efficient Large Vision-Language Models with Dynamic Token Sparsification and KV Cache Compression(ZipVL:动态令牌稀疏化和KV缓存压缩的高效大视觉-语言模型)

[06:31] 📄 Improving Long-Text Alignment for Text-to-Image Diffusion Models(改进文本到图像扩散模型的长文本对齐)

[07:11] 🔄 Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models(简化、稳定和扩展连续时间一致性模型)

[07:55] 🛡 Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements(可控安全对齐:推理时适应多样安全需求)

[08:34] 🔍 Tracking Universal Features Through Fine-Tuning and Model Merging(通过微调和模型合并追踪通用特征)

[09:08] 🔄 Insights from the Inverse: Reconstructing LLM Training Goals Through Inverse RL(逆向洞察:通过逆向强化学习重构LLM训练目标)

[09:46] 🧠 Neural Metamorphosis(神经变形)

[10:25] 🌍 WorldMedQA-V: a multilingual, multimodal medical examination dataset for multimodal language models evaluation(世界医学QA-V:多语言、多模态医学考试数据集用于多模态语言模型评估)

[11:09] 🌐 OMCAT: Omni Context Aware Transformer(全上下文感知变压器)

[11:44] ⏳ ChroKnowledge: Unveiling Chronological Knowledge of Language Models in Multiple Domains(ChroKnowledge:揭示语言模型在多领域中的时间知识)

[12:22] 📚 DyVo: Dynamic Vocabularies for Learned Sparse Retrieval with Entities(DyVo:动态词汇表用于实体学习的稀疏检索)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递

...more
View all episodesView all episodes
Download on the App Store

HuggingFace 每日AI论文速递By duan

  • 5
  • 5
  • 5
  • 5
  • 5

5

2 ratings


More shows like HuggingFace 每日AI论文速递

View all
硅谷101|中国版 by 泓君Jane

硅谷101|中国版

56 Listeners

商业就是这样 by 商业就是这样

商业就是这样

291 Listeners

声动早咖啡 by 声动活泼

声动早咖啡

294 Listeners

思文,败类 by 思文败类

思文,败类

157 Listeners

不开玩笑 Jokes Aside by 不开玩笑JokesAside

不开玩笑 Jokes Aside

136 Listeners

人民公园说AI by JustSayAI

人民公园说AI

7 Listeners

數創實驗室 - AI時代的學習指南 by Vincent在數創

數創實驗室 - AI時代的學習指南

1 Listeners

AI可可AI生活 by fly51fly

AI可可AI生活

0 Listeners