HuggingFace 每日AI论文速递

2025.05.26 | TabSTAR提升表格数据分类性能;QwenLong-L1优化长文本推理


Listen Later

本期的 15 篇论文如下:

[00:23] 📊 TabSTAR: A Foundation Tabular Model With Semantically Target-Aware Representations(TabSTAR:具有语义目标感知表征的表格基础模型)

[00:59] 🧠 QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning(QwenLong-L1:基于强化学习的长文本大型推理模型)

[01:43] 🤔 Reasoning Model is Stubborn: Diagnosing Instruction Overriding in Reasoning Models(推理模型是顽固的:诊断推理模型中的指令覆盖问题)

[02:19] 🚀 Quartet: Native FP4 Training Can Be Optimal for Large Language Models(Quartet:原生FP4训练对于大型语言模型是最优的)

[03:01] 🤖 One RL to See Them All: Visual Triple Unified Reinforcement Learning(万法归一:视觉三元统一强化学习)

[03:36] 🤖 Distilling LLM Agent into Small Models with Retrieval and Code Tools(利用检索和代码工具将大型语言模型Agent提炼到小型模型中)

[04:21] 🤔 PhyX: Does Your Model Have the "Wits" for Physical Reasoning?(PhyX:你的模型具备物理推理的“智慧”吗?)

[05:02] ♾ QwenLong-CPRS: Towards $\infty$-LLMs with Dynamic Context Optimization(QwenLong-CPRS:通过动态上下文优化迈向无限长的语言模型)

[05:46] 🧬 Scaling Image and Video Generation via Test-Time Evolutionary Search(基于测试时演化搜索的图像和视频生成扩展)

[06:21] 🎬 Model Already Knows the Best Noise: Bayesian Active Noise Selection via Attention in Video Diffusion Model(模型早已知晓最佳噪声:视频扩散模型中基于注意力的贝叶斯主动噪声选择)

[07:06] 🤔 VeriThinker: Learning to Verify Makes Reasoning Model Efficient(VeriThinker:通过学习验证来提高推理模型的效率)

[07:45] 🧪 MOOSE-Chem3: Toward Experiment-Guided Hypothesis Ranking via Simulated Experimental Feedback(MOOSE-Chem3:通过模拟实验反馈实现实验指导下的假设排序)

[08:27] 🎧 AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language Models(AudioTrust:音频大语言模型多方面可信度基准测试)

[09:10] 💻 FullFront: Benchmarking MLLMs Across the Full Front-End Engineering Workflow(FullFront:跨越完整前端工程工作流程的多模态大语言模型基准测试)

[09:51] 🤥 Teaching with Lies: Curriculum DPO on Synthetic Negatives for Hallucination Detection(谎言教学:基于合成负样本的课程DPO用于幻觉检测)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递

...more
View all episodesView all episodes
Download on the App Store

HuggingFace 每日AI论文速递By duan