The paper introduces Toolformer, a language model designed to overcome common limitations of standard Large Language Models (LLMs), such as their inability to perform precise math, access up-to-date information, or accurately look up facts without hallucinating.
To address these shortcomings, Toolformer is trained to independently use external tools via simple APIs. The model integrates five specific tools: a calculator, a question-and-answering system, a Wikipedia search engine, a machine translation system, and a calendar.
The key innovation of Toolformer is its self-supervised learning approach. The training process works by:
- Providing the model with a handful of human-written demonstrations of how an API can be used.
- Letting the model automatically annotate a large language modeling dataset with potential API calls.
- Executing and filtering these calls based on whether the API's response actually helps the model predict future text tokens (i.e., reduces cross-entropy loss).
- Finetuning the model on the dataset containing only the API calls it found useful.
Through this process, Toolformer learns which APIs to call, when to call them, what arguments to pass, and how to incorporate the results into its text generation.
Experimental results show that Toolformer (based on a 6.7B parameter GPT-J model) significantly improves zero-shot performance across various downstream tasks. By teaching itself to use external tools, it frequently outperforms much larger models, such as the 175B parameter GPT-3, without sacrificing its core language modeling abilities.