January 31, 2025

2025.01.31 | GuardReasoner提升LLM安全，MedXpertQA挑战医疗AI推理。

Listen Later

6 minutes

本期的 8 篇论文如下：

[00:25] 🛡 GuardReasoner: Towards Reasoning-based LLM Safeguards（GuardReasoner：面向基于推理的LLM安全防护）

[01:04] 🩺 MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding（MedXpertQA：专家级医疗推理与理解基准测试）

[01:58] 🧠 Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs（思维四处游走：关于o1类LLMs的浅思现象）

[02:40] 🌐 Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch（带有重叠通信的流式DiLoCo：迈向分布式免费午餐）

[03:20] 🌍 PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding（PhysBench：评估与增强视觉-语言模型在物理世界理解中的表现）

[04:09] 🤖 WILDCHAT-50M: A Deep Dive Into the Role of Synthetic Data in Post-Training（WILDCHAT-50M：深入探讨合成数据在训练后阶段的作用）

[05:04] 🛡 o3-mini vs DeepSeek-R1: Which One is Safer?（o3-mini 与 DeepSeek-R1：哪个更安全？）

[05:41] 🤔 Large Language Models Think Too Fast To Explore Effectively（大语言模型思考过快导致探索效果不佳）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递

...more

View all episodes

View all episodes

Download on the App Store

Download on the App Store

Get it on Google Play

HuggingFace 每日AI论文速递

By duan

5

22 ratings

January 31, 2025

2025.01.31 | GuardReasoner提升LLM安全，MedXpertQA挑战医疗AI推理。

Listen Later

6 minutes

本期的 8 篇论文如下：

[00:25] 🛡 GuardReasoner: Towards Reasoning-based LLM Safeguards（GuardReasoner：面向基于推理的LLM安全防护）

[01:04] 🩺 MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding（MedXpertQA：专家级医疗推理与理解基准测试）

[01:58] 🧠 Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs（思维四处游走：关于o1类LLMs的浅思现象）

[02:40] 🌐 Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch（带有重叠通信的流式DiLoCo：迈向分布式免费午餐）

[03:20] 🌍 PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding（PhysBench：评估与增强视觉-语言模型在物理世界理解中的表现）

[04:09] 🤖 WILDCHAT-50M: A Deep Dive Into the Role of Synthetic Data in Post-Training（WILDCHAT-50M：深入探讨合成数据在训练后阶段的作用）

[05:04] 🛡 o3-mini vs DeepSeek-R1: Which One is Safer?（o3-mini 与 DeepSeek-R1：哪个更安全？）

[05:41] 🤔 Large Language Models Think Too Fast To Explore Effectively（大语言模型思考过快导致探索效果不佳）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递

...more

More shows like HuggingFace 每日AI论文速递

硅谷101|中国版 by 泓君Jane

硅谷101|中国版

56 Listeners

商业就是这样 by 商业就是这样

商业就是这样

292 Listeners

声动早咖啡 by 声动活泼

声动早咖啡

293 Listeners

思文，败类 by 思文败类

思文，败类

157 Listeners

不开玩笑 Jokes Aside by 不开玩笑JokesAside

不开玩笑 Jokes Aside

136 Listeners

人民公园说AI by JustSayAI

人民公园说AI

7 Listeners

數創實驗室 - AI時代的學習指南 by Vincent在數創

數創實驗室 - AI時代的學習指南

1 Listeners

AI可可AI生活 by fly51fly

AI可可AI生活

0 Listeners