February 18, 2025

2025.02.18 | 稀疏注意力提升效率，机器人起身策略优化。

21 minutes

本期的 29 篇论文如下：

[00:23] ⚡ Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention（原生稀疏注意力：硬件对齐与原生可训练的稀疏注意力）

[01:10] 🤖 Learning Getting-Up Policies for Real-World Humanoid Robots（学习真实世界人形机器人起身策略）

[01:55] 🧠 ReLearn: Unlearning via Learning for Large Language Models（ReLearn：通过学习实现大型语言模型的遗忘）

[02:35] 💻 SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering?（SWE-Lancer：前沿大语言模型能否从真实世界的自由软件工程中赚取100万美元？）

[03:21] 🌐 HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation（赫尔墨斯流：无缝衔接多模态理解和生成）

[03:58] 🧠 How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training（大型语言模型如何获取新知识？知识电路视角下的持续预训练）

[04:33] 🤖 SURGE: On the Potential of Large Language Models as General-Purpose Surrogate Code Executors（SURGE：关于大型语言模型作为通用代理代码执行器的潜力）

[05:12] 🔧 Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising Trajectory Sharpening（扩散锐化：利用去噪轨迹锐化优化扩散模型微调）

[05:55] 🧠 I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models（我思故我扩散：在扩散模型中实现多模态上下文推理）

[06:38] 🔧 SAFE-SQL: Self-Augmented In-Context Learning with Fine-grained Example Selection for Text-to-SQL（SAFE-SQL：基于细粒度示例选择的自增强上下文学习用于文本到SQL转换）

[07:25] 🧠 CRANE: Reasoning with constrained LLM generation（CRANE：受限LLM生成的推理）

[08:07] 🧠 Intuitive physics understanding emerges from self-supervised pretraining on natural videos（直觉物理理解从自然视频的自监督预训练中涌现）

[08:46] 🐦 Cuckoo: An IE Free Rider Hatched by Massive Nutrition in LLM's Nest（杜鹃：在大型语言模型的巢中孵化出的信息抽取搭便车者）

[09:22] 🧠 Dyve: Thinking Fast and Slow for Dynamic Process Verification（Dyve：动态过程验证中的快思与慢想）

[10:06] 🧠 PhysReason: A Comprehensive Benchmark towards Physics-Based Reasoning（物理推理：基于物理推理的综合基准）

[10:53] 🤖 System Message Generation for User Preferences using Open-Source Models（基于开源模型的用户偏好系统消息生成）

[11:38] 🎥 video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model（视频-SALMONN-o1：推理增强的音视频大型语言模型）

[12:33] 🧠 Building A Proof-Oriented Programmer That Is 64% Better Than GPT-4o Under Data Scarsity（构建一个在数据稀缺情况下比GPT-4o好64%的证明导向程序员）

[13:11] 🤖 Memory, Benchmark & Robots: A Benchmark for Solving Complex Tasks with Reinforcement Learning（记忆、基准与机器人：一种用于强化学习解决复杂任务的基准）

[13:52] 🤖 MagicArticulate: Make Your 3D Models Articulation-Ready（魔法清晰：让你的3D模型准备好关节动画）

[14:37] 🤖 Talk Structurally, Act Hierarchically: A Collaborative Framework for LLM Multi-Agent Systems（结构化交流，层次化行动：LLM多智能体系统的协作框架）

[15:21] 🧠 One Example Shown, Many Concepts Known! Counterexample-Driven Conceptual Reasoning in Mathematical LLMs（一个示例展示，多个概念知晓！数学大语言模型中的反例驱动概念推理）

[16:03] 🤖 Can a Single Model Master Both Multi-turn Conversations and Tool Use? CALM: A Unified Conversational Agentic Language Model（单一模型能否同时掌握多轮对话与工具使用？CALM：一个统一的对话代理语言模型）

[16:40] 🚀 Better Embeddings with Coupled Adam（结合Adam优化器的更好嵌入）

[17:18] 🧐 Show Me the Work: Fact-Checkers' Requirements for Explainable Automated Fact-Checking（展示工作：事实核查员对可解释自动化事实核查的需求）

[17:56] 🧪 Towards Data-Efficient Pretraining for Atomic Property Prediction（面向原子性质预测的数据高效预训练）

[18:46] 🌀 The Mirage of Model Editing: Revisiting Evaluation in the Wild（模型编辑的幻象：重新审视实际应用中的评估）

[19:31] 🧮 Large Language Models and Mathematical Reasoning Failures（大型语言模型与数学推理失败）

[20:11] 📊 Language Complexity Measurement as a Noisy Zero-Shot Proxy for Evaluating LLM Performance（语言复杂度测量作为评估LLM性能的噪声零样本代理）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递

...more

View all episodes

By duan

22 ratings