HuggingFace 每日AI论文速递

2025.03.14 | CoSTA*优化多轮编辑效率,无声品牌攻击揭示扩散模型脆弱性。


Listen Later

本期的 15 篇论文如下:

[00:25] 🖼 CoSTA$\ast$: Cost-Sensitive Toolpath Agent for Multi-turn Image Editing(CoSTA*:面向多轮图像编辑的成本敏感工具路径代理)

[01:03] 🎭 Silent Branding Attack: Trigger-free Data Poisoning Attack on Text-to-Image Diffusion Models(无声品牌攻击:针对文本到图像扩散模型的无触发数据投毒攻击)

[01:45] 🌍 World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task Planning(世界建模提升规划器性能:双重偏好优化用于具身任务规划)

[02:30] 🗺 Charting and Navigating Hugging Face's Model Atlas(绘制与导航Hugging Face的模型地图)

[03:14] 🧠 GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing(GoT:释放多模态大型语言模型的推理能力用于视觉生成与编辑)

[03:48] 🎨 CoRe^2: Collect, Reflect and Refine to Generate Better and Faster(CoRe^2:收集、反思与精炼以生成更快更好的图像)

[04:29] 🧠 Transformers without Normalization(无需归一化的Transformer)

[05:06] 🌐 GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding(GroundingSuite:测量复杂多粒度像素接地)

[05:50] 🤖 New Trends for Modern Machine Translation with Large Reasoning Models(现代机器翻译的新趋势:基于大型推理模型的研究)

[06:32] 📝 Shifting Long-Context LLMs Research from Input to Output(从输入到输出:长上下文大语言模型研究的转变)

[07:09] 🌐 VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search(视觉网页指令:通过网络搜索扩展多模态指令数据)

[07:54] 🧠 DiT-Air: Revisiting the Efficiency of Diffusion Model Architecture Design in Text to Image Generation(DiT-Air: 重新审视扩散模型架构设计在文本到图像生成中的效率)

[08:35] 🐱 Do I look like a `cat.n.01` to you? A Taxonomy Image Generation Benchmark(我看起来像一只猫吗?分类图像生成基准)

[09:20] 🎥 Open-Sora 2.0: Training a Commercial-Level Video Generation Model in $200k(Open-Sora 2.0:以20万美元训练商用级视频生成模型)

[10:01] 🎥 Long Context Tuning for Video Generation(长上下文调优用于视频生成)

【关注我们】

您还可以在以下平台找到我们,获得播客内容以外更多信息

小红书: AI速递

...more
View all episodesView all episodes
Download on the App Store

HuggingFace 每日AI论文速递By duan