HuggingFace 每日AI论文速递

2024.10.10 每日AI论文 | LLMs经济游戏表现各异,个性化视觉指令提升AI互动。


Listen Later

本期的 43 篇论文如下:

[00:23] 🤖 GLEE: A Unified Framework and Benchmark for Language-based Economic Environments(GLEE:基于语言的经济环境统一框架与基准)

[01:09] 👤 Personalized Visual Instruction Tuning(个性化视觉指令微调)

[01:48] 🌍 Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation(迈向世界模拟器:基于物理常识的视频生成基准)

[02:35] 🖼 IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation(迭代组合感知反馈学习:从模型库中提升文本到图像生成)

[03:17] 🔍 Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate(解码大型视觉语言模型中的跨模态对齐与模态集成率)

[03:54] 🌐 Aria: An Open Multimodal Native Mixture-of-Experts Model(Aria:一个开放的多模态原生混合专家模型)

[04:29] 🌐 Pixtral 12B(Pixtral 12B)

[05:09] 🎥 Pyramidal Flow Matching for Efficient Video Generative Modeling(金字塔流匹配用于高效视频生成建模)

[05:49] 🔗 Unveiling the Backbone-Optimizer Coupling Bias in Visual Representation Learning(揭示视觉表示学习中的骨干-优化器耦合偏差)

[06:29] 🎥 MM-Ego: Towards Building Egocentric Multimodal LLMs(MM-Ego:构建以自我为中心的多模态大型语言模型)

[07:07] 🔄 One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation(一种初始化方法统治所有:通过解释方差适应进行微调)

[07:51] 📖 Story-Adapter: A Training-free Iterative Framework for Long Story Visualization(故事适配器:一种无需训练的迭代框架用于长故事可视化)

[08:33] 🚀 Self-Boosting Large Language Models with Synthetic Preference Data(利用合成偏好数据自我提升大型语言模型)

[09:13] 🚀 Falcon Mamba: The First Competitive Attention-free 7B Language Model(猎鹰曼巴:首个无注意力机制的7B语言模型)

[09:53] 🎨 TweedieMix: Improving Multi-Concept Fusion for Diffusion-based Image/Video Generation(TweedieMix:改进基于扩散的图像/视频生成中的多概念融合)

[10:24] ⏳ Temporal Reasoning Transfer from Text to Video(从文本到视频的时间推理迁移)

[10:54] 🎥 TRACE: Temporal Grounding Video LLM via Causal Event Modeling(TRACE:通过因果事件建模实现视频时间定位的大型语言模型)

[11:30] 📊 Data Selection via Optimal Control for Language Models(通过最优控制进行语言模型数据选择)

[12:07] 🤖 Response Tuning: Aligning Large Language Models without Instruction(响应调优:无需指令对齐大型语言模型)

[12:49] 🤖 CursorCore: Assist Programming through Aligning Anything(CursorCore:通过对齐任何内容辅助编程)

[13:36] 🎥 ViBiDSampler: Enhancing Video Interpolation Using Bidirectional Diffusion Sampler(ViBiDSampler:利用双向扩散采样器增强视频插值)

[14:16] 🗣 Mixed-Session Conversation with Egocentric Memory(带有自我中心记忆的混合会话)

[14:57] 🎮 ING-VP: MLLMs cannot Play Easy Vision-based Games Yet(ING-VP:多模态大语言模型在视觉游戏中的表现仍不尽人意)

[15:41] 🔓 AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs(AutoDAN-Turbo:一种用于策略自我探索以破解LLMs的终身代理)

[16:26] 🎥 T2V-Turbo-v2: Enhancing Video Generation Model Post-Training through Data, Reward, and Conditional Guidance Design(T2V-Turbo-v2:通过数据、奖励和条件引导设计增强视频生成模型后训练)

[17:00] 📖 Collective Critics for Creative Story Generation(创意故事生成的集体批评框架)

[17:36] 🎵 Diversity-Rewarded CFG Distillation(多样性奖励的CFG蒸馏)

[18:16] 🧠 Retrieval-Augmented Decision Transformer: External Memory for In-context RL(检索增强决策变压器:上下文强化学习的外部记忆)

[18:57] 🎙 F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching(F5-TTS:基于流匹配生成流畅且忠实语音的童话生成器)

[19:32] 🎹 FürElise: Capturing and Physically Synthesizing Hand Motions of Piano Performance(《致爱丽丝:捕捉并物理合成钢琴演奏手部动作》)

[20:20] 🧠 Holistic Unlearning Benchmark: A Multi-Faceted Evaluation for Text-to-Image Diffusion Model Unlearning(整体遗忘基准:文本到图像扩散模型遗忘的多方面评估)

[21:01] 🧬 Multimodal Large Language Models for Inverse Molecular Design with Retrosynthetic Planning(多模态大语言模型用于逆向分子设计与逆合成规划)

[21:38] 🎥 BroadWay: Boost Your Text-to-Video Generation Model in a Training-free Way(BroadWay:无需训练提升文本到视频生成模型)

[22:21] 🚨 Multimodal Situational Safety(多模态情境安全)

[22:56] 💥 Hallucinating AI Hijacking Attack: Large Language Models and Malicious Code Recommenders(幻觉AI劫持攻击:大型语言模型与恶意代码推荐器)

[23:38] 🛠 Seeker: Enhancing Exception Handling in Code with LLM-based Multi-Agent Approach(Seeker:利用基于LLM的多代理方法增强代码中的异常处理)

[24:18] 🌐 Jointly Generating Multi-view Consistent PBR Textures using Collaborative Control(联合生成多视角一致的PBR纹理:协作控制方法)

[24:55] 🤖 TinyEmo: Scaling down Emotional Reasoning via Metric Projection(TinyEmo:通过度量投影缩小情感推理)

[25:29] 🧠 MentalArena: Self-play Training of Language Models for Diagnosis and Treatment of Mental Health Disorders(心理竞技场:通过自我对弈训练语言模型用于心理健康障碍的诊断与治疗)

[26:08] 🎭 TextToon: Real-Time Text Toonify Head Avatar from Single Video(文本转卡通:从单视频实时生成卡通化头部虚拟形象)

[26:49] 🤖 Do great minds think alike? Investigating Human-AI Complementarity in Question Answering with CAIMIRA(伟大的思想是否一致?探究CAIMIRA框架下的人机问答互补性)

[27:28] 📊 MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering(MLE-bench:评估机器学习代理在机器学习工程中的表现)

[28:03] 🧠 Does Spatial Cognition Emerge in Frontier Models?(空间认知在前沿模型中是否出现?)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递

...more
View all episodesView all episodes
Download on the App Store

HuggingFace 每日AI论文速递By duan

  • 5
  • 5
  • 5
  • 5
  • 5

5

2 ratings


More shows like HuggingFace 每日AI论文速递

View all
硅谷101|中国版 by 泓君Jane

硅谷101|中国版

56 Listeners

商业就是这样 by 商业就是这样

商业就是这样

291 Listeners

声动早咖啡 by 声动活泼

声动早咖啡

294 Listeners

思文,败类 by 思文败类

思文,败类

157 Listeners

不开玩笑 Jokes Aside by 不开玩笑JokesAside

不开玩笑 Jokes Aside

136 Listeners

人民公园说AI by JustSayAI

人民公园说AI

7 Listeners

數創實驗室 - AI時代的學習指南 by Vincent在數創

數創實驗室 - AI時代的學習指南

1 Listeners

AI可可AI生活 by fly51fly

AI可可AI生活

0 Listeners