Learning GenAI via SOTA Papers

EP126: OrcaLoca locates bugs in massive codebases


Listen Later

OrcaLoca is an LLM agent framework designed to tackle the critical challenge of software issue localization—the process of precisely identifying and navigating to relevant code sections to fix bugs in large software repositories. Current LLM agents often struggle with localization due to complex codebases, inefficient action planning, and overwhelming context noise.

To address these challenges, OrcaLoca introduces three key components:

  • Priority-Based Scheduling for LLM-Guided Actions: It utilizes a dynamic priority queue to manage search actions, reordering them based on urgency and contextual relevance to avoid redundant and unstable search behaviors.
  • Action Decomposition with Relevance Scoring: It breaks down high-level actions (like searching an entire class or file) into finer sub-actions. A multi-agent workflow then scores and ranks these sub-actions based on their relevance to the bug, ensuring comprehensive but focused exploration.
  • Distance-Aware Searched Context Pruning: It dynamically filters out irrelevant code context using a CodeGraph. By computing the graph distance between search results and potential bug locations, it prunes unhelpful data to keep the LLM focused on the most relevant information.

Results:Through these innovations, OrcaLoca achieved a new open-source state-of-the-art (SOTA) on the SWE-bench Lite benchmark, reaching a 65.33% function match rate and an 83.33% file match rate. Furthermore, by integrating the patch generation capabilities of the Agentless-1.5 framework, OrcaLoca successfully resolved 41.00% of issues, marking a 6.33 percentage point improvement in the final resolved rate over the original Agentless framework.

...more
View all episodesView all episodes
Download on the App Store

Learning GenAI via SOTA PapersBy Yun Wu