HuggingFace 每日AI论文速递

2025.06.04 | 强化学习提升LLM性能;UniWorld统一视觉理解与生成。


Listen Later

本期的 15 篇论文如下:

[00:23] 💡 Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning(反思、重试、奖励:通过强化学习实现LLM的自我提升)

[01:09] 🖼 UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation(UniWorld:用于统一视觉理解与生成的高分辨率语义编码器)

[01:53] 🧪 CSVQA: A Chinese Multimodal Benchmark for Evaluating STEM Reasoning Capabilities of VLMs(CSVQA:一个用于评估视觉语言模型STEM推理能力的中文多模态基准)

[02:37] 🤖 VS-Bench: Evaluating VLMs for Strategic Reasoning and Decision-Making in Multi-Agent Environments(VS-Bench:评估视觉语言模型在多智能体环境中进行战略推理和决策的能力)

[03:15] 🧠 SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis(SynthRL:利用可验证数据合成扩展视觉推理)

[04:01] 🧠 OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models(OmniSpatial:面向视觉语言模型的综合空间推理基准)

[04:47] 🤖 Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces(视觉具身大脑:让多模态大型语言模型在空间中观察、思考和控制)

[05:24] 👀 MotionSight: Boosting Fine-Grained Motion Understanding in Multimodal LLMs(MotionSight:提升多模态大型语言模型中的细粒度运动理解能力)

[06:10] 🤖 GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents(GUI-Actor:面向GUI代理的无坐标视觉定位)

[06:48] 🎬 Sparse-vDiT: Unleashing the Power of Sparse Attention to Accelerate Video Diffusion Transformers(Sparse-vDiT:释放稀疏注意力以加速视频扩散Transformer)

[07:27] 🧩 DINGO: Constrained Inference for Diffusion LLMs(DINGO:扩散LLM的约束推理)

[08:10] 🎬 AnimeShooter: A Multi-Shot Animation Dataset for Reference-Guided Video Generation(AnimeShooter:一个用于参考引导视频生成的多镜头动画数据集)

[08:47] 🤖 Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in Robotics(Robot-R1:用于增强机器人具身推理的强化学习)

[09:35] 🤖 Co-Evolving LLM Coder and Unit Tester via Reinforcement Learning(基于强化学习的LLM代码生成器与单元测试器协同进化)

[10:21] 🖼 Native-Resolution Image Synthesis(原生分辨率图像合成)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递

...more
View all episodesView all episodes
Download on the App Store

HuggingFace 每日AI论文速递By duan