
Sign up to save your podcasts
Or


本期的 15 篇论文如下:
[00:24] 🧠 Qwen3-VL Technical Report(Qwen3-VL 技术报告)
[00:57] 🧠 PretrainZero: Reinforcement Active Pretraining(PretrainZero:强化主动预训练)
[01:36] 🎬 ViDiC: Video Difference Captioning(ViDiC:视频差异描述)
[02:24] 🧠 OneThinker: All-in-one Reasoning Model for Image and Video(OneThinker:面向图像与视频的全能推理模型)
[03:07] 🔄 Rethinking Prompt Design for Inference-time Scaling in Text-to-Visual Generation(重新思考文本到视觉生成中推理时扩展的提示设计)
[03:59] ⚙ Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach(引导视觉-语言-动作模型作为反探索:一种测试时缩放方法)
[04:46] 🤖 SpaceTools: Tool-Augmented Spatial Reasoning via Double Interactive RL(SpaceTools:通过双重交互式强化学习实现工具增强的空间推理)
[05:22] 🔧 Thinking with Programming Vision: Towards a Unified View for Thinking with Images(以编程视觉思考:迈向图像思维的统一视角)
[06:01] 🔄 Flowing Backwards: Improving Normalizing Flows via Reverse Representation Alignment(逆向流动:通过反向表征对齐改进标准化流)
[06:51] 🎮 RELIC: Interactive Video World Model with Long-Horizon Memory(RELIC:具备长时记忆的交互式视频世界模型)
[07:34] 🍳 CookAnything: A Framework for Flexible and Consistent Multi-Step Recipe Image Generation(CookAnything:灵活且一致的多步骤食谱图像生成框架)
[08:26] 🧠 SR-GRPO: Stable Rank as an Intrinsic Geometric Reward for Large Language Model Alignment(SR-GRPO:将稳定秩作为大语言模型对齐的内在几何奖励)
[09:01] 📊 AlignBench: Benchmarking Fine-Grained Image-Text Alignment with Synthetic Image-Caption Pairs(AlignBench:基于合成图像-描述对评估细粒度图文对齐的基准)
[09:38] 🧠 SkillFactory: Self-Distillation For Learning Cognitive Behaviors(SkillFactory:用于学习认知行为的自蒸馏方法)
[10:20] 📱 UniQL: Unified Quantization and Low-rank Compression for Adaptive Edge LLMs(UniQL:面向自适应边缘大语言模型的统一量化与低秩压缩)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
By duan5
22 ratings
本期的 15 篇论文如下:
[00:24] 🧠 Qwen3-VL Technical Report(Qwen3-VL 技术报告)
[00:57] 🧠 PretrainZero: Reinforcement Active Pretraining(PretrainZero:强化主动预训练)
[01:36] 🎬 ViDiC: Video Difference Captioning(ViDiC:视频差异描述)
[02:24] 🧠 OneThinker: All-in-one Reasoning Model for Image and Video(OneThinker:面向图像与视频的全能推理模型)
[03:07] 🔄 Rethinking Prompt Design for Inference-time Scaling in Text-to-Visual Generation(重新思考文本到视觉生成中推理时扩展的提示设计)
[03:59] ⚙ Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach(引导视觉-语言-动作模型作为反探索:一种测试时缩放方法)
[04:46] 🤖 SpaceTools: Tool-Augmented Spatial Reasoning via Double Interactive RL(SpaceTools:通过双重交互式强化学习实现工具增强的空间推理)
[05:22] 🔧 Thinking with Programming Vision: Towards a Unified View for Thinking with Images(以编程视觉思考:迈向图像思维的统一视角)
[06:01] 🔄 Flowing Backwards: Improving Normalizing Flows via Reverse Representation Alignment(逆向流动:通过反向表征对齐改进标准化流)
[06:51] 🎮 RELIC: Interactive Video World Model with Long-Horizon Memory(RELIC:具备长时记忆的交互式视频世界模型)
[07:34] 🍳 CookAnything: A Framework for Flexible and Consistent Multi-Step Recipe Image Generation(CookAnything:灵活且一致的多步骤食谱图像生成框架)
[08:26] 🧠 SR-GRPO: Stable Rank as an Intrinsic Geometric Reward for Large Language Model Alignment(SR-GRPO:将稳定秩作为大语言模型对齐的内在几何奖励)
[09:01] 📊 AlignBench: Benchmarking Fine-Grained Image-Text Alignment with Synthetic Image-Caption Pairs(AlignBench:基于合成图像-描述对评估细粒度图文对齐的基准)
[09:38] 🧠 SkillFactory: Self-Distillation For Learning Cognitive Behaviors(SkillFactory:用于学习认知行为的自蒸馏方法)
[10:20] 📱 UniQL: Unified Quantization and Low-rank Compression for Adaptive Edge LLMs(UniQL:面向自适应边缘大语言模型的统一量化与低秩压缩)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递

56 Listeners

291 Listeners

294 Listeners

156 Listeners

135 Listeners

7 Listeners

1 Listeners

0 Listeners