Seventy3

【第94期】AgentTrek:为GUI Agent生成高质量数据的pipeline


Listen Later

Seventy3: 用NotebookLM将论文生成播客,让大家跟着AI一起进步。

今天的主题是:AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials

Summary

The paper introduces AgentTrek, a novel pipeline for synthesizing high-quality training data for Graphical User Interface (GUI) agents. AgentTrek leverages web tutorials to generate large-scale, multi-step agent trajectories, significantly reducing the cost and effort compared to human annotation. The pipeline automatically gathers and processes tutorials, uses a visual-language model (VLM) to simulate task execution, and incorporates an evaluator to ensure data quality. Experiments demonstrate that agents trained on this synthesized data significantly outperform those trained on existing datasets, showcasing AgentTrek's effectiveness in improving both grounding and planning capabilities. The resulting dataset is comprehensive, including multimodal data such as screenshots, accessibility trees, and reasoning traces.

这篇论文介绍了 AgentTrek,一种用于生成高质量图形用户界面(GUI)代理训练数据的新型流水线。AgentTrek 利用网页教程生成大规模、多步骤的代理轨迹,与人工标注相比,显著降低了成本和工作量。该流水线自动收集和处理教程,使用视觉语言模型(VLM)模拟任务执行,并引入一个评估器以确保数据质量。实验表明,基于此合成数据训练的代理在性能上显著优于使用现有数据集训练的代理,展示了 AgentTrek 在提升代理的语义理解能力和规划能力方面的有效性。生成的数据集十分全面,包括多模态数据,如截图、可访问性树和推理轨迹。

原文链接:https://arxiv.org/abs/2412.09605

...more
View all episodesView all episodes
Download on the App Store

Seventy3By 任雨山