本期的 17 篇论文如下:
[00:24] 📚 2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining(2.5年课堂:用于视觉-语言预训练的多模态教科书)
[01:02] 🎥 VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control(VideoAnydoor:高保真视频对象插入与精确运动控制)
[01:39] 🎥 VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM(VideoRefer套件:通过视频大语言模型推进时空对象理解)
[02:13] 🏆 CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings(CodeElo:基于人类可比Elo评分的大语言模型竞赛级代码生成基准测试)
[02:52] 🎨 Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models(重建与生成:潜在扩散模型中的优化困境驯服)
[03:29] 🤖 ProgCo: Program Helps Self-Correction of Large Language Models(ProgCo:程序助力大语言模型自我修正)
[04:03] 🗺 MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models(MapEval:基于地图的基础模型地理空间推理能力评估)
[04:41] 🤖 A3: Android Agent Arena for Mobile GUI Agents(A3:移动GUI代理的安卓代理竞技场)
[05:21] 🧪 Dynamic Scaling of Unit Tests for Code Reward Modeling(代码奖励建模中单元测试的动态扩展)
[05:57] 🛡 MLLM-as-a-Judge for Image Safety without Human Labeling(无需人工标注的图像安全MLLM-as-a-Judge方法)
[06:40] 🎥 LTX-Video: Realtime Video Latent Diffusion(LTX-视频:实时视频潜在扩散模型)
[07:15] 🗺 MapQaTor: A System for Efficient Annotation of Map Query Datasets(MapQaTor:高效地图查询数据集标注系统)
[07:51] 🔍 Understanding and Mitigating Bottlenecks of State Space Models through the Lens of Recency and Over-smoothing(通过近期性和过度平滑的视角理解并缓解状态空间模型的瓶颈)
[08:29] 🎥 SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration(SeedVR:在扩散Transformer中播种无限,实现通用视频修复)
[09:13] 🤖 SeFAR: Semi-supervised Fine-grained Action Recognition with Temporal Perturbation and Learning Stabilization(SeFAR:基于时间扰动和学习稳定的半监督细粒度动作识别)
[09:50] 🧠 Rethinking Addressing in Language Models via Contexualized Equivariant Positional Encoding(重新思考语言模型中的寻址机制:基于上下文等变位置编码)
[10:27] 📊 Population Aware Diffusion for Time Series Generation(面向时间序列生成的群体感知扩散模型)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递