HuggingFace 每日AI论文速递

2025.06.18 | MultiFinBen揭示金融模型局限;测试时计算提升LLM Agent性能。


Listen Later

本期的 15 篇论文如下:

[00:23] 📊 MultiFinBen: A Multilingual, Multimodal, and Difficulty-Aware Benchmark for Financial LLM Evaluation(MultiFinBen:一个多语言、多模态和难度感知的金融领域大语言模型评估基准)

[01:03] 🤖 Scaling Test-time Compute for LLM Agents(扩展LLM Agent的测试时计算)

[01:38] 🎼 CMI-Bench: A Comprehensive Benchmark for Evaluating Music Instruction Following(CMI-Bench:一个评估音乐指令跟随的综合性基准)

[02:16] 💬 LongLLaDA: Unlocking Long Context Capabilities in Diffusion LLMs(LongLLaDA:解锁扩散语言模型中的长文本能力)

[02:57] 🤔 Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs(基于可验证奖励的强化学习隐式地激励基础大语言模型中的正确推理)

[03:40] 🧠 Xolver: Multi-Agent Reasoning with Holistic Experience Learning Just Like an Olympiad Team(Xolver: 像奥林匹克团队一样利用整体经验进行多智能体推理)

[04:20] 🗣 Stream-Omni: Simultaneous Multimodal Interactions with Large Language-Vision-Speech Model(Stream-Omni:与大型语言-视觉-语音模型的同时多模态交互)

[05:02] ⚕ Efficient Medical VIE via Reinforcement Learning(基于强化学习的高效医学视觉信息抽取)

[05:40] 🤔 Reasoning with Exploration: An Entropy Perspective(基于探索的推理:一个熵的视角)

[06:18] 🧠 QFFT, Question-Free Fine-Tuning for Adaptive Reasoning(QFFT:用于自适应推理的无问题微调)

[06:52] 🎨 Align Your Flow: Scaling Continuous-Time Flow Map Distillation(对齐你的流:扩展连续时间流映射蒸馏)

[07:27] 🧪 Can LLMs Generate High-Quality Test Cases for Algorithm Problems? TestCase-Eval: A Systematic Evaluation of Fault Coverage and Exposure(大语言模型能否为算法问题生成高质量测试用例?TestCase-Eval:容错覆盖和暴露的系统性评估)

[08:07] 🤖 Guaranteed Guess: A Language Modeling Approach for CISC-to-RISC Transpilation with Testing Guarantees(有保证的猜测:一种基于语言建模的CISC到RISC代码转换方法,并提供测试保证)

[08:58] 🛠 CRITICTOOL: Evaluating Self-Critique Capabilities of Large Language Models in Tool-Calling Error Scenarios(CRITICTOOL:评估大型语言模型在工具调用错误场景中的自我批判能力)

[09:38] 📊 xbench: Tracking Agents Productivity Scaling with Profession-Aligned Real-World Evaluations(xbench:通过与职业对齐的真实世界评估追踪Agent的生产力提升)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递

...more
View all episodesView all episodes
Download on the App Store

HuggingFace 每日AI论文速递By duan

  • 5
  • 5
  • 5
  • 5
  • 5

5

2 ratings


More shows like HuggingFace 每日AI论文速递

View all
硅谷101|中国版 by 泓君Jane

硅谷101|中国版

56 Listeners

商业就是这样 by 商业就是这样

商业就是这样

292 Listeners

声动早咖啡 by 声动活泼

声动早咖啡

293 Listeners

思文,败类 by 思文败类

思文,败类

156 Listeners

不开玩笑 Jokes Aside by 不开玩笑JokesAside

不开玩笑 Jokes Aside

136 Listeners

人民公园说AI by JustSayAI

人民公园说AI

7 Listeners

數創實驗室 - AI時代的學習指南 by Vincent在數創

數創實驗室 - AI時代的學習指南

1 Listeners

AI可可AI生活 by fly51fly

AI可可AI生活

0 Listeners