April 26, 2025

Optimal Tool Calls in Language Model Reasoning

24 minutes

This paper addresses the issue of inefficient tool use by large language models in tool-integrated reasoning. It introduces a novel reinforcement learning framework called Optimal Tool Call-controlled Policy Optimization (OTC-PO). OTC-PO incentivizes models to produce accurate answers while minimizing the number of tool calls. This is achieved through a tool-integrated reward that considers both answer correctness and tool efficiency. Experiments show that OTC-PO significantly reduces tool calls and improves tool productivity without sacrificing accuracy on various question-answering benchmarks. The proposed method offers a way to train more cost-effective and intelligent language agents that can strategically utilize external tools.

...more

View all episodes

By Enoch H. Kang

April 26, 2025

Optimal Tool Calls in Language Model Reasoning

24 minutes

...more

Share Optimal Tool Calls in Language Model Reasoning

Sign up to save your podcasts

Optimal Tool Calls in Language Model Reasoning

Optimal Tool Calls in Language Model Reasoning