HuggingFace 每日AI论文速递

2025.02.19 | 数据高效语音处理,嵌入空间压缩创新。


Listen Later

本期的 20 篇论文如下:

[00:25] 🎙 Soundwave: Less is More for Speech-Text Alignment in LLMs(声波:减少数据需求,优化语音与文本对齐在LLMs中的应用)

[01:05] 🔍 Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity(将1568个Token压缩到一个向量并再次解压:探索嵌入空间容量的极限)

[01:48] 🌊 Continuous Diffusion Model for Language Modeling(连续扩散模型用于语言建模)

[02:30] 🎥 Phantom: Subject-consistent video generation via cross-modal alignment(幻影:通过跨模态对齐实现主体一致性视频生成)

[03:12] 🧠 Rethinking Diverse Human Preference Learning through Principal Component Analysis(重新思考通过主成分分析进行多样化人类偏好学习)

[04:00] 🤖 SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation(SoFar:语言引导的方向桥接空间推理与对象操作)

[04:36] 🛡 SafeRoute: Adaptive Model Selection for Efficient and Accurate Safety Guardrails in Large Language Models(SafeRoute:大型语言模型中高效且准确的安全防护栏的自适应模型选择)

[05:25] 🐍 Multimodal Mamba: Decoder-only Multimodal State Space Model via Quadratic to Linear Distillation(多模态Mamba:通过二次到线性蒸馏的解码器多模态状态空间模型)

[06:08] 📚 You Do Not Fully Utilize Transformer's Representation Capacity(你没有充分利用Transformer的表示能力)

[06:50] 🤖 Magma: A Foundation Model for Multimodal AI Agents(熔岩:多模态AI代理的基础模型)

[07:23] 💹 FLAG-Trader: Fusion LLM-Agent with Gradient-based Reinforcement Learning for Financial Trading(FLAG-Trader:融合LLM与基于梯度的强化学习用于金融交易)

[08:08] 📄 RealSyn: An Effective and Scalable Multimodal Interleaved Document Transformation Paradigm(RealSyn:一种有效且可扩展的多模态交错文档转换范式)

[08:49] 🧠 PAFT: Prompt-Agnostic Fine-Tuning(PAFT:与提示无关的微调)

[09:27] 🛠 OctoTools: An Agentic Framework with Extensible Tools for Complex Reasoning(OctoTools:一个具有扩展工具的复杂推理代理框架)

[10:13] 📊 Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities?(重新审视o1类模型的测试时缩放能力:它们是否真正具备测试时缩放能力?)

[11:00] 🔄 MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections(MUDDFormer:通过多路动态密集连接打破Transformer中的残差瓶颈)

[11:37] 🩺 HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation(HealthGPT:通过异构知识适应实现医疗大视觉语言模型的统一理解与生成)

[12:12] 🧠 HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading(HeadInfer:通过分头卸载实现高效的LLM推理)

[12:51] 🌍 Text2World: Benchmarking Large Language Models for Symbolic World Model Generation(文本到世界:大语言模型符号世界模型生成的基准测试)

[13:32] 🧠 Atom of Thoughts for Markov LLM Test-Time Scaling(用于马尔可夫LLM测试时扩展的原子思维)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递

...more
View all episodesView all episodes
Download on the App Store

HuggingFace 每日AI论文速递By duan

  • 5
  • 5
  • 5
  • 5
  • 5

5

2 ratings


More shows like HuggingFace 每日AI论文速递

View all
硅谷101|中国版 by 泓君Jane

硅谷101|中国版

56 Listeners

商业就是这样 by 商业就是这样

商业就是这样

292 Listeners

声动早咖啡 by 声动活泼

声动早咖啡

293 Listeners

思文,败类 by 思文败类

思文,败类

157 Listeners

不开玩笑 Jokes Aside by 不开玩笑JokesAside

不开玩笑 Jokes Aside

136 Listeners

人民公园说AI by JustSayAI

人民公园说AI

7 Listeners

數創實驗室 - AI時代的學習指南 by Vincent在數創

數創實驗室 - AI時代的學習指南

1 Listeners

AI可可AI生活 by fly51fly

AI可可AI生活

0 Listeners