HuggingFace 每日AI论文速递

2025.04.28 | 视频相机运动理解提升;多模态推理模型优化


Listen Later

本期的 11 篇论文如下:

[00:22] 🎥 Towards Understanding Camera Motions in Any Video(迈向理解任意视频中的相机运动)

[01:04] 🧠 Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning(Skywork R1V2:用于推理的多模态混合强化学习)

[01:49] 💡 BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs(BitNet v2:用于1-bit LLM的具有哈达玛变换的原生4-bit激活)

[02:28] 🌍 VideoVista-CulturalLingo: 360$^\circ$ Horizons-Bridging Cultures, Languages, and Domains in Video Comprehension(VideoVista-CulturalLingo:360°视野——弥合视频理解中的文化、语言和领域差异)

[03:13] 🗣 Can Large Language Models Help Multimodal Language Analysis? MMLA: A Comprehensive Benchmark(大型语言模型能否助力多模态语言分析?MMLA:一个综合性的基准)

[03:48] 🤔 The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs(稀疏前沿:Transformer LLM 中的稀疏注意力权衡)

[04:23] 🎬 Subject-driven Video Generation via Disentangled Identity and Motion(基于解耦身份与运动的主体驱动视频生成)

[05:00] 🧠 DianJin-R1: Evaluating and Enhancing Financial Reasoning in Large Language Models(DianJin-R1:评估并提升大型语言模型中的金融推理能力)

[05:34] 🔲 DC-SAM: In-Context Segment Anything in Images and Videos via Dual Consistency(DC-SAM:通过双重一致性实现图像和视频中的上下文分割)

[06:12] 🔊 Kimi-Audio Technical Report(Kimi-Audio技术报告)

[06:43] 🇮 Optimizing LLMs for Italian: Reducing Token Fertility and Enhancing Efficiency Through Vocabulary Adaptation(优化意大利语大型语言模型:通过词汇调整减少Token冗余并提高效率)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递

...more
View all episodesView all episodes
Download on the App Store

HuggingFace 每日AI论文速递By duan