
Sign up to save your podcasts
Or
本期的 15 篇论文如下:
[00:23] 📊 TabSTAR: A Foundation Tabular Model With Semantically Target-Aware Representations(TabSTAR:具有语义目标感知表征的表格基础模型)
[00:59] 🧠 QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning(QwenLong-L1:基于强化学习的长文本大型推理模型)
[01:43] 🤔 Reasoning Model is Stubborn: Diagnosing Instruction Overriding in Reasoning Models(推理模型是顽固的:诊断推理模型中的指令覆盖问题)
[02:19] 🚀 Quartet: Native FP4 Training Can Be Optimal for Large Language Models(Quartet:原生FP4训练对于大型语言模型是最优的)
[03:01] 🤖 One RL to See Them All: Visual Triple Unified Reinforcement Learning(万法归一:视觉三元统一强化学习)
[03:36] 🤖 Distilling LLM Agent into Small Models with Retrieval and Code Tools(利用检索和代码工具将大型语言模型Agent提炼到小型模型中)
[04:21] 🤔 PhyX: Does Your Model Have the "Wits" for Physical Reasoning?(PhyX:你的模型具备物理推理的“智慧”吗?)
[05:02] ♾ QwenLong-CPRS: Towards $\infty$-LLMs with Dynamic Context Optimization(QwenLong-CPRS:通过动态上下文优化迈向无限长的语言模型)
[05:46] 🧬 Scaling Image and Video Generation via Test-Time Evolutionary Search(基于测试时演化搜索的图像和视频生成扩展)
[06:21] 🎬 Model Already Knows the Best Noise: Bayesian Active Noise Selection via Attention in Video Diffusion Model(模型早已知晓最佳噪声:视频扩散模型中基于注意力的贝叶斯主动噪声选择)
[07:06] 🤔 VeriThinker: Learning to Verify Makes Reasoning Model Efficient(VeriThinker:通过学习验证来提高推理模型的效率)
[07:45] 🧪 MOOSE-Chem3: Toward Experiment-Guided Hypothesis Ranking via Simulated Experimental Feedback(MOOSE-Chem3:通过模拟实验反馈实现实验指导下的假设排序)
[08:27] 🎧 AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language Models(AudioTrust:音频大语言模型多方面可信度基准测试)
[09:10] 💻 FullFront: Benchmarking MLLMs Across the Full Front-End Engineering Workflow(FullFront:跨越完整前端工程工作流程的多模态大语言模型基准测试)
[09:51] 🤥 Teaching with Lies: Curriculum DPO on Synthetic Negatives for Hallucination Detection(谎言教学:基于合成负样本的课程DPO用于幻觉检测)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
本期的 15 篇论文如下:
[00:23] 📊 TabSTAR: A Foundation Tabular Model With Semantically Target-Aware Representations(TabSTAR:具有语义目标感知表征的表格基础模型)
[00:59] 🧠 QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning(QwenLong-L1:基于强化学习的长文本大型推理模型)
[01:43] 🤔 Reasoning Model is Stubborn: Diagnosing Instruction Overriding in Reasoning Models(推理模型是顽固的:诊断推理模型中的指令覆盖问题)
[02:19] 🚀 Quartet: Native FP4 Training Can Be Optimal for Large Language Models(Quartet:原生FP4训练对于大型语言模型是最优的)
[03:01] 🤖 One RL to See Them All: Visual Triple Unified Reinforcement Learning(万法归一:视觉三元统一强化学习)
[03:36] 🤖 Distilling LLM Agent into Small Models with Retrieval and Code Tools(利用检索和代码工具将大型语言模型Agent提炼到小型模型中)
[04:21] 🤔 PhyX: Does Your Model Have the "Wits" for Physical Reasoning?(PhyX:你的模型具备物理推理的“智慧”吗?)
[05:02] ♾ QwenLong-CPRS: Towards $\infty$-LLMs with Dynamic Context Optimization(QwenLong-CPRS:通过动态上下文优化迈向无限长的语言模型)
[05:46] 🧬 Scaling Image and Video Generation via Test-Time Evolutionary Search(基于测试时演化搜索的图像和视频生成扩展)
[06:21] 🎬 Model Already Knows the Best Noise: Bayesian Active Noise Selection via Attention in Video Diffusion Model(模型早已知晓最佳噪声:视频扩散模型中基于注意力的贝叶斯主动噪声选择)
[07:06] 🤔 VeriThinker: Learning to Verify Makes Reasoning Model Efficient(VeriThinker:通过学习验证来提高推理模型的效率)
[07:45] 🧪 MOOSE-Chem3: Toward Experiment-Guided Hypothesis Ranking via Simulated Experimental Feedback(MOOSE-Chem3:通过模拟实验反馈实现实验指导下的假设排序)
[08:27] 🎧 AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language Models(AudioTrust:音频大语言模型多方面可信度基准测试)
[09:10] 💻 FullFront: Benchmarking MLLMs Across the Full Front-End Engineering Workflow(FullFront:跨越完整前端工程工作流程的多模态大语言模型基准测试)
[09:51] 🤥 Teaching with Lies: Curriculum DPO on Synthetic Negatives for Hallucination Detection(谎言教学:基于合成负样本的课程DPO用于幻觉检测)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递