HuggingFace 每日AI论文速递

2025.11.21 | V-ReasonBench考视频模型推理;Step-Audio-R1让语音越“想”越强


Listen Later

本期的 15 篇论文如下:

[00:22] 📊 V-ReasonBench: Toward Unified Reasoning Benchmark Suite for Video Generation Models(V-ReasonBench:面向视频生成模型的统一推理基准套件)

[01:06] 🧠 Step-Audio-R1 Technical Report(Step-Audio-R1技术报告)

[01:48] 🧭 Scaling Spatial Intelligence with Multimodal Foundation Models(通过多模态基础模型扩展空间智能)

[02:18] 🎬 First Frame Is the Place to Go for Video Content Customization(首帧是实现视频内容定制化的关键所在)

[02:49] 🎬 Video-as-Answer: Predict and Generate Next Video Event with Joint-GRPO(视频即答案:使用联合GRPO预测并生成下一视频事件)

[03:29] 🔮 SAM 3D: 3Dfy Anything in Images(SAM 3D:图像中任意物体的三维化)

[04:03] 🚀 MiMo-Embodied: X-Embodied Foundation Model Technical Report(MiMo-Embodied:跨具身基础模型技术报告)

[04:38] 🧠 Thinking-while-Generating: Interleaving Textual Reasoning throughout Visual Generation(边生成边思考:在视觉生成中交织文本推理)

[05:10] 🏆 TurkColBERT: A Benchmark of Dense and Late-Interaction Models for Turkish Information Retrieval(TurkColBERT:土耳其信息检索中稠密与延迟交互模型的基准研究)

[05:53] 🌀 Nemotron Elastic: Towards Efficient Many-in-One Reasoning LLMs(Nemotron Elastic:迈向高效多合一推理大语言模型)

[06:26] 🚀 SRPO: Self-Referential Policy Optimization for Vision-Language-Action Models(自参考策略优化:面向视觉-语言-动作模型)

[07:09] 🎬 TimeViper: A Hybrid Mamba-Transformer Vision-Language Model for Efficient Long Video Understanding(TimeViper:一种用于高效长视频理解的混合Mamba-Transformer视觉语言模型)

[07:46] 🔬 SAM2S: Segment Anything in Surgical Videos via Semantic Long-term Tracking(SAM2S:通过语义长期跟踪实现手术视频中的任意分割)

[08:23] 🎨 NaTex: Seamless Texture Generation as Latent Color Diffusion(NaTex:作为潜在颜色扩散的无缝纹理生成)

[08:58] 📐 PartUV: Part-Based UV Unwrapping of 3D Meshes(PartUV:基于部件分割的3D网格UV展开方法)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递

...more
View all episodesView all episodes
Download on the App Store

HuggingFace 每日AI论文速递By duan

  • 5
  • 5
  • 5
  • 5
  • 5

5

2 ratings


More shows like HuggingFace 每日AI论文速递

View all
硅谷101|中国版 by 泓君Jane

硅谷101|中国版

56 Listeners

商业就是这样 by 商业就是这样

商业就是这样

291 Listeners

声动早咖啡 by 声动活泼

声动早咖啡

294 Listeners

思文,败类 by 思文败类

思文,败类

156 Listeners

不开玩笑 Jokes Aside by 不开玩笑JokesAside

不开玩笑 Jokes Aside

135 Listeners

人民公园说AI by JustSayAI

人民公园说AI

7 Listeners

數創實驗室 - AI時代的學習指南 by Vincent在數創

數創實驗室 - AI時代的學習指南

1 Listeners

AI可可AI生活 by fly51fly

AI可可AI生活

0 Listeners