Learning GenAI via SOTA Papers

EP121: How ToolLLaMA mastered 16000 real world APIs


Listen Later

This paper introduces ToolLLM, a comprehensive framework designed to equip open-source large language models (LLMs) with the ability to master over 16,000 real-world APIs. While closed-source models like ChatGPT excel at using external tools, open-source models like LLaMA currently fall short because their instruction tuning primarily focuses on basic language tasks. Existing datasets for tool learning also suffer from limitations such as a lack of real-world APIs, constrained single-tool scenarios, and inferior reasoning methods.

To address these issues, the researchers developed several key components:

  • ToolBench: An instruction-tuning dataset constructed automatically using ChatGPT. The creation process involved collecting 16,464 RESTful APIs from RapidAPI, prompting ChatGPT to generate diverse single-tool and multi-tool instructions, and annotating the solution paths.
  • Depth-First Search-based Decision Tree (DFSDT): A novel reasoning algorithm developed to overcome the limitations of standard methods like Chain-of-Thought (CoT) and ReACT. DFSDT broadens the search space by allowing the model to evaluate multiple reasoning paths, deliberately retract steps, and avoid getting trapped in faulty loops.
  • ToolEval: An automatic evaluation metric backed by ChatGPT that calculates the "pass rate" (successful execution) and "win rate" (quality of the solution path) of the model's tool-use performance.
  • Neural API Retriever: A dense retriever trained to automatically recommend the most relevant APIs from a massive pool for any given instruction.

By fine-tuning LLaMA-2 on the ToolBench dataset, the authors produced ToolLLaMA. Experiments demonstrate that ToolLLaMA performs comparably to ChatGPT and significantly outperforms other open-source models. It exhibits a remarkable ability to execute complex, multi-step instructions and can successfully generalize to entirely unseen APIs just by reading their documentation. ToolLLaMA also shows strong out-of-distribution generalization on external datasets like APIBench.

...more
View all episodesView all episodes
Download on the App Store

Learning GenAI via SOTA PapersBy Yun Wu