October 10, 2024

2024.10.10 每日AI论文 | LLMs经济游戏表现各异，个性化视觉指令提升AI互动。

29 minutes

本期的 43 篇论文如下：

[00:23] 🤖 GLEE: A Unified Framework and Benchmark for Language-based Economic Environments（GLEE：基于语言的经济环境统一框架与基准）

[01:09] 👤 Personalized Visual Instruction Tuning（个性化视觉指令微调）

[01:48] 🌍 Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation（迈向世界模拟器：基于物理常识的视频生成基准）

[02:35] 🖼 IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation（迭代组合感知反馈学习：从模型库中提升文本到图像生成）

[03:17] 🔍 Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate（解码大型视觉语言模型中的跨模态对齐与模态集成率）

[03:54] 🌐 Aria: An Open Multimodal Native Mixture-of-Experts Model（Aria：一个开放的多模态原生混合专家模型）

[04:29] 🌐 Pixtral 12B（Pixtral 12B）

[05:09] 🎥 Pyramidal Flow Matching for Efficient Video Generative Modeling（金字塔流匹配用于高效视频生成建模）

[05:49] 🔗 Unveiling the Backbone-Optimizer Coupling Bias in Visual Representation Learning（揭示视觉表示学习中的骨干-优化器耦合偏差）

[06:29] 🎥 MM-Ego: Towards Building Egocentric Multimodal LLMs（MM-Ego：构建以自我为中心的多模态大型语言模型）

[07:07] 🔄 One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation（一种初始化方法统治所有：通过解释方差适应进行微调）

[07:51] 📖 Story-Adapter: A Training-free Iterative Framework for Long Story Visualization（故事适配器：一种无需训练的迭代框架用于长故事可视化）

[08:33] 🚀 Self-Boosting Large Language Models with Synthetic Preference Data（利用合成偏好数据自我提升大型语言模型）

[09:13] 🚀 Falcon Mamba: The First Competitive Attention-free 7B Language Model（猎鹰曼巴：首个无注意力机制的7B语言模型）

[09:53] 🎨 TweedieMix: Improving Multi-Concept Fusion for Diffusion-based Image/Video Generation（TweedieMix：改进基于扩散的图像/视频生成中的多概念融合）

[10:24] ⏳ Temporal Reasoning Transfer from Text to Video（从文本到视频的时间推理迁移）

[10:54] 🎥 TRACE: Temporal Grounding Video LLM via Causal Event Modeling（TRACE：通过因果事件建模实现视频时间定位的大型语言模型）

[11:30] 📊 Data Selection via Optimal Control for Language Models（通过最优控制进行语言模型数据选择）

[12:07] 🤖 Response Tuning: Aligning Large Language Models without Instruction（响应调优：无需指令对齐大型语言模型）

[12:49] 🤖 CursorCore: Assist Programming through Aligning Anything（CursorCore：通过对齐任何内容辅助编程）

[13:36] 🎥 ViBiDSampler: Enhancing Video Interpolation Using Bidirectional Diffusion Sampler（ViBiDSampler：利用双向扩散采样器增强视频插值）

[14:16] 🗣 Mixed-Session Conversation with Egocentric Memory（带有自我中心记忆的混合会话）

[14:57] 🎮 ING-VP: MLLMs cannot Play Easy Vision-based Games Yet（ING-VP：多模态大语言模型在视觉游戏中的表现仍不尽人意）

[15:41] 🔓 AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs（AutoDAN-Turbo：一种用于策略自我探索以破解LLMs的终身代理）

[16:26] 🎥 T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance Design（T2V-Turbo-v2：通过数据、奖励和条件引导设计增强视频生成模型后训练）

[17:00] 📖 Collective Critics for Creative Story Generation（创意故事生成的集体批评框架）

[17:36] 🎵 Diversity-Rewarded CFG Distillation（多样性奖励的CFG蒸馏）

[18:16] 🧠 Retrieval-Augmented Decision Transformer: External Memory for In-context RL（检索增强决策变压器：上下文强化学习的外部记忆）

[18:57] 🎙 F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching（F5-TTS：基于流匹配生成流畅且忠实语音的童话生成器）

[19:32] 🎹 FürElise: Capturing and Physically Synthesizing Hand Motions of Piano Performance（《致爱丽丝：捕捉并物理合成钢琴演奏手部动作》）

[20:20] 🧠 Holistic Unlearning Benchmark: A Multi-Faceted Evaluation for Text-to-Image Diffusion Model Unlearning（整体遗忘基准：文本到图像扩散模型遗忘的多方面评估）

[21:01] 🧬 Multimodal Large Language Models for Inverse Molecular Design with Retrosynthetic Planning（多模态大语言模型用于逆向分子设计与逆合成规划）

[21:38] 🎥 BroadWay: Boost Your Text-to-Video Generation Model in a Training-free Way（BroadWay：无需训练提升文本到视频生成模型）

[22:21] 🚨 Multimodal Situational Safety（多模态情境安全）

[22:56] 💥 Hallucinating AI Hijacking Attack: Large Language Models and Malicious Code Recommenders（幻觉AI劫持攻击：大型语言模型与恶意代码推荐器）

[23:38] 🛠 Seeker: Enhancing Exception Handling in Code with LLM-based Multi-Agent Approach（Seeker：利用基于LLM的多代理方法增强代码中的异常处理）

[24:18] 🌐 Jointly Generating Multi-view Consistent PBR Textures using Collaborative Control（联合生成多视角一致的PBR纹理：协作控制方法）

[24:55] 🤖 TinyEmo: Scaling down Emotional Reasoning via Metric Projection（TinyEmo：通过度量投影缩小情感推理）

[25:29] 🧠 MentalArena: Self-play Training of Language Models for Diagnosis and Treatment of Mental Health Disorders（心理竞技场：通过自我对弈训练语言模型用于心理健康障碍的诊断与治疗）

[26:08] 🎭 TextToon: Real-Time Text Toonify Head Avatar from Single Video（文本转卡通：从单视频实时生成卡通化头部虚拟形象）

[26:49] 🤖 Do great minds think alike? Investigating Human-AI Complementarity in Question Answering with CAIMIRA（伟大的思想是否一致？探究CAIMIRA框架下的人机问答互补性）

[27:28] 📊 MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering（MLE-bench：评估机器学习代理在机器学习工程中的表现）

[28:03] 🧠 Does Spatial Cognition Emerge in Frontier Models?（空间认知在前沿模型中是否出现？）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递

...more

View all episodes

By duan

22 ratings