
Sign up to save your podcasts
Or
本期的 18 篇论文如下:
[00:24] 🤖 TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks(TheAgentCompany:在具有重要现实意义的任务上对LLM代理进行基准测试)
[01:06] 🎥 AniDoc: Animation Creation Made Easier(AniDoc:让动画制作更简单)
[01:44] 👗 FashionComposer: Compositional Fashion Image Generation(时尚组合器:组合式时尚图像生成)
[02:28] 🤖 Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning(高效扩散Transformer策略与专家去噪混合模型在多任务学习中的应用)
[03:05] 🌐 Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation(提示深度任意模型用于4K分辨率精确度量深度估计)
[03:42] 🔄 Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN(混合层归一化:通过结合预层归一化和后层归一化释放深层层的潜力)
[04:26] 🤖 GUI Agents: A Survey(图形用户界面代理:综述)
[05:12] 🌍 AnySat: An Earth Observation Model for Any Resolutions, Scales, and Modalities(AnySat:适用于任意分辨率、尺度和模态的地球观测模型)
[05:51] 📊 RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment(RAG-RewardBench:在检索增强生成中评估奖励模型以实现偏好对齐)
[06:40] 🧠 LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer(LLaVA-UHD v2:通过分层窗口Transformer集成高分辨率特征金字塔的多模态大语言模型)
[07:30] 🤖 Learning from Massive Human Videos for Universal Humanoid Pose Control(从大规模人类视频中学习通用拟人姿态控制)
[08:05] 🤖 ChatDiT: A Training-Free Baseline for Task-Agnostic Free-Form Chatting with Diffusion Transformers(ChatDiT:一种无需训练的任务无关自由形式聊天扩散变换器基线)
[08:49] 🎥 VidTok: A Versatile and Open-Source Video Tokenizer(VidTok:一种多功能且开源的视频标记器)
[09:28] 🧠 Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces(空间思维:多模态大语言模型如何看、记和回忆空间)
[10:13] 🔄 CAD-Recode: Reverse Engineering CAD Code from Point Clouds(CAD-Recode:从点云逆向工程CAD代码)
[10:54] 🤖 AntiLeak-Bench: Preventing Data Contamination by Automatically Constructing Benchmarks with Updated Real-World Knowledge(AntiLeak-Bench:通过自动构建基准测试防止数据污染)
[11:39] 🤖 Alignment faking in large language models(大型语言模型中的对齐伪装)
[12:19] ⚡ FastVLM: Efficient Vision Encoding for Vision Language Models(FastVLM:高效视觉编码在视觉语言模型中的应用)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递
本期的 18 篇论文如下:
[00:24] 🤖 TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks(TheAgentCompany:在具有重要现实意义的任务上对LLM代理进行基准测试)
[01:06] 🎥 AniDoc: Animation Creation Made Easier(AniDoc:让动画制作更简单)
[01:44] 👗 FashionComposer: Compositional Fashion Image Generation(时尚组合器:组合式时尚图像生成)
[02:28] 🤖 Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning(高效扩散Transformer策略与专家去噪混合模型在多任务学习中的应用)
[03:05] 🌐 Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation(提示深度任意模型用于4K分辨率精确度量深度估计)
[03:42] 🔄 Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN(混合层归一化:通过结合预层归一化和后层归一化释放深层层的潜力)
[04:26] 🤖 GUI Agents: A Survey(图形用户界面代理:综述)
[05:12] 🌍 AnySat: An Earth Observation Model for Any Resolutions, Scales, and Modalities(AnySat:适用于任意分辨率、尺度和模态的地球观测模型)
[05:51] 📊 RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment(RAG-RewardBench:在检索增强生成中评估奖励模型以实现偏好对齐)
[06:40] 🧠 LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer(LLaVA-UHD v2:通过分层窗口Transformer集成高分辨率特征金字塔的多模态大语言模型)
[07:30] 🤖 Learning from Massive Human Videos for Universal Humanoid Pose Control(从大规模人类视频中学习通用拟人姿态控制)
[08:05] 🤖 ChatDiT: A Training-Free Baseline for Task-Agnostic Free-Form Chatting with Diffusion Transformers(ChatDiT:一种无需训练的任务无关自由形式聊天扩散变换器基线)
[08:49] 🎥 VidTok: A Versatile and Open-Source Video Tokenizer(VidTok:一种多功能且开源的视频标记器)
[09:28] 🧠 Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces(空间思维:多模态大语言模型如何看、记和回忆空间)
[10:13] 🔄 CAD-Recode: Reverse Engineering CAD Code from Point Clouds(CAD-Recode:从点云逆向工程CAD代码)
[10:54] 🤖 AntiLeak-Bench: Preventing Data Contamination by Automatically Constructing Benchmarks with Updated Real-World Knowledge(AntiLeak-Bench:通过自动构建基准测试防止数据污染)
[11:39] 🤖 Alignment faking in large language models(大型语言模型中的对齐伪装)
[12:19] ⚡ FastVLM: Efficient Vision Encoding for Vision Language Models(FastVLM:高效视觉编码在视觉语言模型中的应用)
【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递