ずんだもんのHugging Faceニュース

Daily AI Papers Briefing (2026-04-09)


Listen Later

【本日の論文】
1. Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding
https://huggingface.co/papers/2604.05015
2. Claw-Eval: Toward Trustworthy Evaluation of Autonomous Agents
https://huggingface.co/papers/2604.06132
3. Learning to Retrieve from Agent Trajectories
https://huggingface.co/papers/2604.04949
4. ACES: Who Tests the Tests? Leave-One-Out AUC Consistency for Code Generation
https://huggingface.co/papers/2604.03922
5. GBQA: A Game Benchmark for Evaluating LLMs as Quality Assurance Engineers
https://huggingface.co/papers/2604.02648
...more
View all episodesView all episodes
Download on the App Store

ずんだもんのHugging FaceニュースBy ksterx