October 15, 2025

2025.10.15 | 像素级自监督ViT刷新生成基准；多智能体评测网文翻译新标尺

Listen Later

10 minutes

本期的 14 篇论文如下：

[00:20] 🖼 Advancing End-to-End Pixel Space Generative Modeling via Self-supervised Pre-training（通过自监督预训练推进端到端像素空间生成建模）

[00:53] 📚 DITING: A Multi-Agent Evaluation Framework for Benchmarking Web Novel Translation（DITING：面向网络小说翻译评测的多智能体基准框架）

[01:41] 🌐 Scaling Language-Centric Omnimodal Representation Learning（以语言为中心的跨模态表征扩展学习）

[02:29] 🎯 Detect Anything via Next Point Prediction（通过下一点预测检测万物）

[03:02] ⚡ FlashVSR: Towards Real-Time Diffusion-Based Streaming Video Super-Resolution（FlashVSR：迈向实时扩散式流媒体视频超分辨率）

[03:40] 🎯 Temporal Alignment Guidance: On-Manifold Sampling in Diffusion Models（时间对齐引导：扩散模型中的流形采样）

[04:16] 🧠 Dr.LLM: Dynamic Layer Routing in LLMs（Dr.LLM：大模型中的动态层级路由）

[05:03] 🎯 Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model（空间强迫：面向视觉-语言-动作模型的隐式空间表征对齐）

[05:50] 🤖 ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning（ERA：借助具身先验学习与在线强化学习将视觉-语言模型转化为具身智能体）

[06:35] 🤖 Robot Learning: A Tutorial（机器人学习教程：从强化学习到多任务通用模型）

[07:27] 🔄 SRUM: Fine-Grained Self-Rewarding for Unified Multimodal Models（SRUM：面向统一多模态模型的细粒度自奖励机制）

[08:01] 🧠 Boundary-Guided Policy Optimization for Memory-efficient RL of Diffusion Large Language Models（面向扩散大语言模型的边界引导策略优化：内存高效的强化学习）

[09:06] 🖼 UniFusion: Vision-Language Model as Unified Encoder in Image Generation（UniFusion：将视觉-语言模型统一作为图像生成的编码器）

[09:43] 🧠 Memory as Action: Autonomous Context Curation for Long-Horizon Agentic Tasks（记忆即行动：面向长程智能体任务的自主上下文策展）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递

...more

View all episodes

View all episodes

Download on the App Store

Download on the App Store

Get it on Google Play

HuggingFace 每日AI论文速递

By duan

5

22 ratings

October 15, 2025

2025.10.15 | 像素级自监督ViT刷新生成基准；多智能体评测网文翻译新标尺

Listen Later

10 minutes

本期的 14 篇论文如下：

[00:20] 🖼 Advancing End-to-End Pixel Space Generative Modeling via Self-supervised Pre-training（通过自监督预训练推进端到端像素空间生成建模）

[00:53] 📚 DITING: A Multi-Agent Evaluation Framework for Benchmarking Web Novel Translation（DITING：面向网络小说翻译评测的多智能体基准框架）

[01:41] 🌐 Scaling Language-Centric Omnimodal Representation Learning（以语言为中心的跨模态表征扩展学习）

[02:29] 🎯 Detect Anything via Next Point Prediction（通过下一点预测检测万物）

[03:02] ⚡ FlashVSR: Towards Real-Time Diffusion-Based Streaming Video Super-Resolution（FlashVSR：迈向实时扩散式流媒体视频超分辨率）

[03:40] 🎯 Temporal Alignment Guidance: On-Manifold Sampling in Diffusion Models（时间对齐引导：扩散模型中的流形采样）

[04:16] 🧠 Dr.LLM: Dynamic Layer Routing in LLMs（Dr.LLM：大模型中的动态层级路由）

[05:03] 🎯 Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model（空间强迫：面向视觉-语言-动作模型的隐式空间表征对齐）

[05:50] 🤖 ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning（ERA：借助具身先验学习与在线强化学习将视觉-语言模型转化为具身智能体）

[06:35] 🤖 Robot Learning: A Tutorial（机器人学习教程：从强化学习到多任务通用模型）

[07:27] 🔄 SRUM: Fine-Grained Self-Rewarding for Unified Multimodal Models（SRUM：面向统一多模态模型的细粒度自奖励机制）

[08:01] 🧠 Boundary-Guided Policy Optimization for Memory-efficient RL of Diffusion Large Language Models（面向扩散大语言模型的边界引导策略优化：内存高效的强化学习）

[09:06] 🖼 UniFusion: Vision-Language Model as Unified Encoder in Image Generation（UniFusion：将视觉-语言模型统一作为图像生成的编码器）

[09:43] 🧠 Memory as Action: Autonomous Context Curation for Long-Horizon Agentic Tasks（记忆即行动：面向长程智能体任务的自主上下文策展）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递

...more

More shows like HuggingFace 每日AI论文速递

硅谷101|中国版 by 泓君Jane

硅谷101|中国版

56 Listeners

商业就是这样 by 商业就是这样

商业就是这样

291 Listeners

声动早咖啡 by 声动活泼

声动早咖啡

294 Listeners

思文，败类 by 思文败类

思文，败类

156 Listeners

不开玩笑 Jokes Aside by 不开玩笑JokesAside

不开玩笑 Jokes Aside

135 Listeners

人民公园说AI by JustSayAI

人民公园说AI

7 Listeners

數創實驗室 - AI時代的學習指南 by Vincent在數創

數創實驗室 - AI時代的學習指南

1 Listeners

AI可可AI生活 by fly51fly

AI可可AI生活

0 Listeners