
Sign up to save your podcasts
Or


本期的 15 篇论文如下:
[00:23] 🧠 MMGR: Multi-Modal Generative Reasoning(MMGR:多模态生成式推理评估与基准)
[01:14] 🎮 WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling(WorldPlay:面向实时交互式世界建模的长期几何一致性研究)
[01:47] 🤖 Video Reality Test: Can AI-Generated ASMR Videos fool VLMs and Humans?(视频真实性测试:AI生成的ASMR视频能否欺骗视觉语言模型与人类?)
[02:46] 🎨 Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling(Scone:通过统一理解-生成建模桥接主题驱动图像生成中的组合与区分)
[03:29] 🤖 RoboTracer: Mastering Spatial Trace with Reasoning in Vision-Language Models for Robotics(RoboTracer:视觉语言模型在机器人学中掌握基于推理的空间轨迹追踪)
[04:13] 📊 OpenDataArena: A Fair and Open Arena for Benchmarking Post-Training Dataset Value(OpenDataArena:一个用于基准测试训练后数据集价值的公平开放平台)
[04:50] 🎨 Vector Prism: Animating Vector Graphics by Stratifying Semantic Structure(矢量棱镜:通过分层语义结构实现矢量图形动画)
[05:36] 🧊 Reveal Hidden Pitfalls and Navigate Next Generation of Vector Similarity Search from Task-Centric Views(揭示隐藏陷阱并从任务中心视角导航下一代向量相似性搜索)
[06:14] 🧠 RecGPT-V2 Technical Report(RecGPT-V2 技术报告)
[07:04] 📊 ShowTable: Unlocking Creative Table Visualization with Collaborative Reflection and Refinement(ShowTable:通过协作反思与精炼解锁创意表格可视化)
[07:43] 🎬 MemFlow: Flowing Adaptive Memory for Consistent and Efficient Long Video Narratives(MemFlow:用于一致且高效长视频叙事的自适应记忆流)
[08:22] 🧠 VersatileFFN: Achieving Parameter Efficiency in LLMs via Adaptive Wide-and-Deep Reuse(VersatileFFN:通过自适应宽深复用实现大语言模型的参数高效性)
[09:04] 🎨 Feedforward 3D Editing via Text-Steerable Image-to-3D(基于文本可操控图像到三维的前馈式编辑方法)
[09:52] 🤖 A4-Agent: An Agentic Framework for Zero-Shot Affordance Reasoning(A4-Agent:一种用于零样本可供性推理的智能体框架)
[10:26] 🎬 SS4D: Native 4D Generative Model via Structured Spacetime Latents(SS4D:基于结构化时空潜在表示的本地4D生成模型)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
By duan5
22 ratings
本期的 15 篇论文如下:
[00:23] 🧠 MMGR: Multi-Modal Generative Reasoning(MMGR:多模态生成式推理评估与基准)
[01:14] 🎮 WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling(WorldPlay:面向实时交互式世界建模的长期几何一致性研究)
[01:47] 🤖 Video Reality Test: Can AI-Generated ASMR Videos fool VLMs and Humans?(视频真实性测试:AI生成的ASMR视频能否欺骗视觉语言模型与人类?)
[02:46] 🎨 Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling(Scone:通过统一理解-生成建模桥接主题驱动图像生成中的组合与区分)
[03:29] 🤖 RoboTracer: Mastering Spatial Trace with Reasoning in Vision-Language Models for Robotics(RoboTracer:视觉语言模型在机器人学中掌握基于推理的空间轨迹追踪)
[04:13] 📊 OpenDataArena: A Fair and Open Arena for Benchmarking Post-Training Dataset Value(OpenDataArena:一个用于基准测试训练后数据集价值的公平开放平台)
[04:50] 🎨 Vector Prism: Animating Vector Graphics by Stratifying Semantic Structure(矢量棱镜:通过分层语义结构实现矢量图形动画)
[05:36] 🧊 Reveal Hidden Pitfalls and Navigate Next Generation of Vector Similarity Search from Task-Centric Views(揭示隐藏陷阱并从任务中心视角导航下一代向量相似性搜索)
[06:14] 🧠 RecGPT-V2 Technical Report(RecGPT-V2 技术报告)
[07:04] 📊 ShowTable: Unlocking Creative Table Visualization with Collaborative Reflection and Refinement(ShowTable:通过协作反思与精炼解锁创意表格可视化)
[07:43] 🎬 MemFlow: Flowing Adaptive Memory for Consistent and Efficient Long Video Narratives(MemFlow:用于一致且高效长视频叙事的自适应记忆流)
[08:22] 🧠 VersatileFFN: Achieving Parameter Efficiency in LLMs via Adaptive Wide-and-Deep Reuse(VersatileFFN:通过自适应宽深复用实现大语言模型的参数高效性)
[09:04] 🎨 Feedforward 3D Editing via Text-Steerable Image-to-3D(基于文本可操控图像到三维的前馈式编辑方法)
[09:52] 🤖 A4-Agent: An Agentic Framework for Zero-Shot Affordance Reasoning(A4-Agent:一种用于零样本可供性推理的智能体框架)
[10:26] 🎬 SS4D: Native 4D Generative Model via Structured Spacetime Latents(SS4D:基于结构化时空潜在表示的本地4D生成模型)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递

56 Listeners

291 Listeners

295 Listeners

156 Listeners

135 Listeners

7 Listeners

1 Listeners

0 Listeners