
Sign up to save your podcasts
Or
Seventy3: 用NotebookLM将论文生成播客,让大家跟着AI一起进步。
今天的主题是:Agentless: Demystifying LLM-based Software Engineering AgentsSummary
This research paper introduces AGENTLESS, a novel approach to automated software development that eschews complex autonomous agents. Instead, AGENTLESS employs a simpler three-phase process: localization, repair, and patch validation, leveraging large language models (LLMs) for each phase. The authors benchmark AGENTLESS against existing agent-based systems on SWE-bench Lite, demonstrating surprisingly high performance and low cost. They further analyze SWE-bench Lite, identifying problematic issues and creating a refined dataset, SWE-bench Lite-S, for more robust evaluation. Finally, the study highlights AGENTLESS's adoption by OpenAI and its superior performance on their SWE-bench Verified benchmark.
这篇研究论文介绍了 AGENTLESS,一种新颖的自动化软件开发方法,摒弃了复杂的自主智能体(autonomous agents)。相反,AGENTLESS 采用一个更简单的三阶段流程:定位、修复和补丁验证,并在每个阶段中利用大型语言模型(LLMs)。作者在 SWE-bench Lite 基准上对 AGENTLESS 与现有基于智能体的系统进行了对比,结果显示出其出乎意料的高性能和低成本。此外,他们对 SWE-bench Lite 进行了深入分析,识别出其中的问题,并构建了一个经过优化的数据集 SWE-bench Lite-S,以实现更稳健的评估。最后,研究强调了 AGENTLESS 被 OpenAI 采用,并在他们的 SWE-bench Verified 基准上表现出优越的性能。
原文链接:https://arxiv.org/abs/2407.01489
Seventy3: 用NotebookLM将论文生成播客,让大家跟着AI一起进步。
今天的主题是:Agentless: Demystifying LLM-based Software Engineering AgentsSummary
This research paper introduces AGENTLESS, a novel approach to automated software development that eschews complex autonomous agents. Instead, AGENTLESS employs a simpler three-phase process: localization, repair, and patch validation, leveraging large language models (LLMs) for each phase. The authors benchmark AGENTLESS against existing agent-based systems on SWE-bench Lite, demonstrating surprisingly high performance and low cost. They further analyze SWE-bench Lite, identifying problematic issues and creating a refined dataset, SWE-bench Lite-S, for more robust evaluation. Finally, the study highlights AGENTLESS's adoption by OpenAI and its superior performance on their SWE-bench Verified benchmark.
这篇研究论文介绍了 AGENTLESS,一种新颖的自动化软件开发方法,摒弃了复杂的自主智能体(autonomous agents)。相反,AGENTLESS 采用一个更简单的三阶段流程:定位、修复和补丁验证,并在每个阶段中利用大型语言模型(LLMs)。作者在 SWE-bench Lite 基准上对 AGENTLESS 与现有基于智能体的系统进行了对比,结果显示出其出乎意料的高性能和低成本。此外,他们对 SWE-bench Lite 进行了深入分析,识别出其中的问题,并构建了一个经过优化的数据集 SWE-bench Lite-S,以实现更稳健的评估。最后,研究强调了 AGENTLESS 被 OpenAI 采用,并在他们的 SWE-bench Verified 基准上表现出优越的性能。
原文链接:https://arxiv.org/abs/2407.01489