HuggingFace 每日AI论文速递

2025.02.14 | GPU扩展至300万tokens,文本编码器内存高效策略。


Listen Later

本期的 18 篇论文如下:

[00:21] 🚀 InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU(InfiniteHiP:在单个GPU上扩展语言模型上下文至300万 tokens)

[01:07] 🖼 Skrr: Skip and Re-use Text Encoder Layers for Memory Efficient Text-to-Image Generation(Skrr:跳过并重用文本编码器层以实现内存高效文本到图像生成)

[01:49] 🧠 An Open Recipe: Adapting Language-Specific LLMs to a Reasoning Model in One Day via Model Merging(一个开放的方案:通过模型合并在一日内将语言特定LLM适应为推理模型)

[02:31] 📚 SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models(SelfCite:大语言模型中上下文归属的自监督对齐方法)

[03:14] 🐕 Can this Model Also Recognize Dogs? Zero-Shot Model Search from Weights(该模型也能识别狗吗?基于权重的零样本模型搜索)

[03:56] 🌐 Exploring the Potential of Encoder-free Architectures in 3D LMMs(探索无编码器架构在三维大尺度多模态模型中的潜力)

[04:39] 🎭 CoSER: Coordinating LLM-Based Persona Simulation of Established Roles(协同角色模拟:基于大语言模型的角色扮演语言代理)

[05:26] 🌐 TripoSG: High-Fidelity 3D Shape Synthesis using Large-Scale Rectified Flow Models(TripoSG:使用大规模校正流模型生成高保真3D形状)

[06:09] 🤖 EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents(EmbodiedBench:全面评估视觉驱动具身智能体多模态大语言模型)

[07:00] 🌪 Typhoon T1: An Open Thai Reasoning Model(台风T1:一个开放的泰语推理模型)

[07:54] 🤖 Logical Reasoning in Large Language Models: A Survey(大型语言模型中的逻辑推理:综述)

[08:36] 🧠 MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency(MME-CoT:评估大型多模态模型中链式思维推理质量、鲁棒性和效率)

[09:23] 🧠 CoT-Valve: Length-Compressible Chain-of-Thought Tuning(长度可压缩的链式思维调优)

[10:11] 🤖 SQuARE: Sequential Question Answering Reasoning Engine for Enhanced Chain-of-Thought in Large Language Models(SQuARE:增强大型语言模型链式思考的顺序问答推理引擎)

[10:52] 🌐 mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data(mmE5:通过高质量合成数据改进多模态多语言嵌入)

[11:36] 🦜 The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding(随机鹦鹉在大语言模型肩上:物理概念理解的总结性评估)

[12:18] 🤖 DexTrack: Towards Generalizable Neural Tracking Control for Dexterous Manipulation from Human References(DexTrack:面向人类参考的灵巧操作通用神经跟踪控制)

[13:00] 🔍 3CAD: A Large-Scale Real-World 3C Product Dataset for Unsupervised Anomaly(3CAD:一个大规模真实3C产品数据集用于无监督异常检测)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递

...more
View all episodesView all episodes
Download on the App Store

HuggingFace 每日AI论文速递By duan

  • 5
  • 5
  • 5
  • 5
  • 5

5

2 ratings


More shows like HuggingFace 每日AI论文速递

View all
硅谷101|中国版 by 泓君Jane

硅谷101|中国版

56 Listeners

商业就是这样 by 商业就是这样

商业就是这样

292 Listeners

声动早咖啡 by 声动活泼

声动早咖啡

293 Listeners

思文,败类 by 思文败类

思文,败类

157 Listeners

不开玩笑 Jokes Aside by 不开玩笑JokesAside

不开玩笑 Jokes Aside

136 Listeners

人民公园说AI by JustSayAI

人民公园说AI

7 Listeners

數創實驗室 - AI時代的學習指南 by Vincent在數創

數創實驗室 - AI時代的學習指南

1 Listeners

AI可可AI生活 by fly51fly

AI可可AI生活

0 Listeners