HuggingFace 每日AI论文速递

2025.05.15 | 解耦学习提升感知性能;多模态模型优化图像生成。


Listen Later

本期的 11 篇论文如下:

[00:23] 🖼 DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception(DeCLIP:用于开放词汇密集感知的解耦学习)

[01:02] 🖼 BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset(BLIP3-o:一族完全开放的统一多模态模型——架构、训练和数据集)

[01:41] 💡 Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures(DeepSeek-V3 的深度剖析:AI 架构的扩展挑战与硬件思考)

[02:24] 🎨 Marigold: Affordable Adaptation of Diffusion-Based Image Generators for Image Analysis(Marigold:用于图像分析的基于扩散的图像生成器的经济型适配)

[03:00] 🤖 UniSkill: Imitating Human Videos via Cross-Embodiment Skill Representations(UniSkill:通过跨具身技能表征模仿人类视频)

[03:42] 🐛 SweRank: Software Issue Localization with Code Ranking(SweRank:基于代码排序的软件问题定位)

[04:23] 🤔 VCRBench: Exploring Long-form Causal Reasoning Capabilities of Large Video Language Models(VCRBench:探索大型视频语言模型在长程因果推理方面的能力)

[05:14] 🖼 CAST: Component-Aligned 3D Scene Reconstruction from an RGB Image(CAST:基于RGB图像的组件对齐三维场景重建)

[05:49] 🤔 Omni-R1: Do You Really Need Audio to Fine-Tune Your Audio LLM?(Omni-R1: 微调音频大语言模型真的需要音频数据吗?)

[06:27] 🤔 Visually Interpretable Subtask Reasoning for Visual Question Answering(视觉问答中基于视觉可解释性的子任务推理)

[06:59] 🚁 DetReIDX: A Stress-Test Dataset for Real-World UAV-Based Person Recognition(DetReIDX:一个用于现实世界无人机人员识别的压力测试数据集)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递

...more
View all episodesView all episodes
Download on the App Store

HuggingFace 每日AI论文速递By duan