May 26, 2025

2025.05.26 | TabSTAR提升表格数据分类性能；QwenLong-L1优化长文本推理

11 minutes

本期的 15 篇论文如下：

[00:23] 📊 TabSTAR: A Foundation Tabular Model With Semantically Target-Aware Representations（TabSTAR：具有语义目标感知表征的表格基础模型）

[00:59] 🧠 QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning（QwenLong-L1：基于强化学习的长文本大型推理模型）

[01:43] 🤔 Reasoning Model is Stubborn: Diagnosing Instruction Overriding in Reasoning Models（推理模型是顽固的：诊断推理模型中的指令覆盖问题）

[02:19] 🚀 Quartet: Native FP4 Training Can Be Optimal for Large Language Models（Quartet：原生FP4训练对于大型语言模型是最优的）

[03:01] 🤖 One RL to See Them All: Visual Triple Unified Reinforcement Learning（万法归一：视觉三元统一强化学习）

[03:36] 🤖 Distilling LLM Agent into Small Models with Retrieval and Code Tools（利用检索和代码工具将大型语言模型Agent提炼到小型模型中）

[04:21] 🤔 PhyX: Does Your Model Have the "Wits" for Physical Reasoning?（PhyX：你的模型具备物理推理的“智慧”吗？）

[05:02] ♾ QwenLong-CPRS: Towards $\infty$-LLMs with Dynamic Context Optimization（QwenLong-CPRS：通过动态上下文优化迈向无限长的语言模型）

[05:46] 🧬 Scaling Image and Video Generation via Test-Time Evolutionary Search（基于测试时演化搜索的图像和视频生成扩展）

[06:21] 🎬 Model Already Knows the Best Noise: Bayesian Active Noise Selection via Attention in Video Diffusion Model（模型早已知晓最佳噪声：视频扩散模型中基于注意力的贝叶斯主动噪声选择）

[07:06] 🤔 VeriThinker: Learning to Verify Makes Reasoning Model Efficient（VeriThinker：通过学习验证来提高推理模型的效率）

[07:45] 🧪 MOOSE-Chem3: Toward Experiment-Guided Hypothesis Ranking via Simulated Experimental Feedback（MOOSE-Chem3：通过模拟实验反馈实现实验指导下的假设排序）

[08:27] 🎧 AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language Models（AudioTrust：音频大语言模型多方面可信度基准测试）

[09:10] 💻 FullFront: Benchmarking MLLMs Across the Full Front-End Engineering Workflow（FullFront：跨越完整前端工程工作流程的多模态大语言模型基准测试）

[09:51] 🤥 Teaching with Lies: Curriculum DPO on Synthetic Negatives for Hallucination Detection（谎言教学：基于合成负样本的课程DPO用于幻觉检测）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递

...more

View all episodes

By duan