HuggingFace 每日AI论文速递

2025.03.04 | 强化视觉推理,提升3D重建质量。


Listen Later

本期的 20 篇论文如下:

[00:21] 🧠 Visual-RFT: Visual Reinforcement Fine-Tuning(视觉强化微调:视觉强化微调)

[01:05] 🌐 Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models(Difix3D+:通过单步扩散模型改进三维重建)

[01:43] 🧠 Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs(Phi-4-Mini技术报告:通过LoRA混合的多模态语言模型实现紧凑且强大的性能)

[02:25] 🎥 OneRec: Unifying Retrieve and Rank with Generative Recommender and Iterative Preference Alignment(OneRec:统一生成推荐与迭代偏好对齐)

[03:04] 🤔 When an LLM is apprehensive about its answers -- and when its uncertainty is justified(当LLM对其答案感到不安时——以及何时其不确定性是有道理的)

[03:46] 🎵 DiffRhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion(DiffRhythm:基于潜在扩散的超快速且极度简单的端到端全长歌曲生成)

[04:28] 🐯 Liger: Linearizing Large Language Models to Gated Recurrent Structures(Liger:将大型语言模型线性化为门控递归结构)

[05:05] 📊 Qilin: A Multimodal Information Retrieval Dataset with APP-level User Sessions(麒麟:一个包含应用级用户会话的多模态信息检索数据集)

[05:50] 🧠 Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs(实现自我改进推理者的认知行为,或,高效STaRs的四个习惯)

[06:28] ⚡ Speculative Ad-hoc Querying(投机性即席查询)

[07:15] ⚡ DuoDecoding: Hardware-aware Heterogeneous Speculative Decoding with Dynamic Multi-Sequence Drafting(双解码:硬件感知的异构推测解码与动态多序列草稿)

[07:52] 🎨 Kiss3DGen: Repurposing Image Diffusion Models for 3D Asset Generation(Kiss3DGen: repurposing Image Diffusion Models for 3D Asset Generation)

[08:31] 🧠 Word Form Matters: LLMs' Semantic Reconstruction under Typoglycemia(词形重要:LLM在字谜现象下的语义重构)

[09:10] ⚡ From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation up to 100K Tokens(从小时到分钟:超长序列生成的高效加速,最高可达100K tokens)

[09:47] 🔍 Large-Scale Data Selection for Instruction Tuning(大规模数据选择用于指令微调)

[10:26] 🌐 SampleMix: A Sample-wise Pre-training Data Mixing Strategey by Coordinating Data Quality and Diversity(SampleMix:一种协调数据质量和多样性的样本级预训练数据混合策略)

[11:01] 🤖 CodeArena: A Collective Evaluation Platform for LLM Code Generation(CodeArena:面向LLM代码生成的大规模评估平台)

[11:47] 🎥 VideoUFO: A Million-Scale User-Focused Dataset for Text-to-Video Generation(视频UFO:用于文本到视频生成的大规模用户聚焦数据集)

[12:42] 🎙 PodAgent: A Comprehensive Framework for Podcast Generation(PodAgent:播客生成的综合框架)

[13:18] 🏠 Unposed Sparse Views Room Layout Reconstruction in the Age of Pretrain Model(无姿态稀疏视角房间布局重建在预训练模型时代的应用)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递

...more
View all episodesView all episodes
Download on the App Store

HuggingFace 每日AI论文速递By duan

  • 5
  • 5
  • 5
  • 5
  • 5

5

2 ratings


More shows like HuggingFace 每日AI论文速递

View all
硅谷101|中国版 by 泓君Jane

硅谷101|中国版

56 Listeners

商业就是这样 by 商业就是这样

商业就是这样

292 Listeners

声动早咖啡 by 声动活泼

声动早咖啡

293 Listeners

思文,败类 by 思文败类

思文,败类

157 Listeners

不开玩笑 Jokes Aside by 不开玩笑JokesAside

不开玩笑 Jokes Aside

136 Listeners

人民公园说AI by JustSayAI

人民公园说AI

7 Listeners

數創實驗室 - AI時代的學習指南 by Vincent在數創

數創實驗室 - AI時代的學習指南

1 Listeners

AI可可AI生活 by fly51fly

AI可可AI生活

0 Listeners