
Sign up to save your podcasts
Or
TL;DR: We present a retrieval-augmented LM system that nears the human crowd performance on judgemental forecasting.
Paper: https://arxiv.org/abs/2402.18563 (Danny Halawi*, Fred Zhang*, Chen Yueh-Han*, and Jacob Steinhardt)
Abstract.Forecasting future events is important for policy and decision-making. In this work, we study whether language models (LMs) can forecast at the level of competitive human forecasters. Towards this goal, we develop a retrieval-augmented LM system designed to automatically search for relevant information, generate forecasts, and aggregate predictions. To facilitate our study, we collect a large dataset of questions from competitive forecasting platforms. Under a test set published after the knowledge cut-offs of our LMs, we evaluate the end-to-end performance of our system against the aggregates of human forecasts. On average, the system nears the crowd aggregate of competitive forecasters and in some settings, surpasses it. Our work suggests that using LMs to forecast the future could provide accurate predictions [...]
---
Outline:
(01:29) Current LMs are not naturally good at forecasting
(02:19) System building
(03:24) Data and models
(04:14) Evaluation results
(04:37) Unconditional setting
(05:03) Selective setting
(06:16) Calibration
(06:54) Future work
---
First published:
Source:
Narrated by TYPE III AUDIO.
TL;DR: We present a retrieval-augmented LM system that nears the human crowd performance on judgemental forecasting.
Paper: https://arxiv.org/abs/2402.18563 (Danny Halawi*, Fred Zhang*, Chen Yueh-Han*, and Jacob Steinhardt)
Abstract.Forecasting future events is important for policy and decision-making. In this work, we study whether language models (LMs) can forecast at the level of competitive human forecasters. Towards this goal, we develop a retrieval-augmented LM system designed to automatically search for relevant information, generate forecasts, and aggregate predictions. To facilitate our study, we collect a large dataset of questions from competitive forecasting platforms. Under a test set published after the knowledge cut-offs of our LMs, we evaluate the end-to-end performance of our system against the aggregates of human forecasts. On average, the system nears the crowd aggregate of competitive forecasters and in some settings, surpasses it. Our work suggests that using LMs to forecast the future could provide accurate predictions [...]
---
Outline:
(01:29) Current LMs are not naturally good at forecasting
(02:19) System building
(03:24) Data and models
(04:14) Evaluation results
(04:37) Unconditional setting
(05:03) Selective setting
(06:16) Calibration
(06:54) Future work
---
First published:
Source:
Narrated by TYPE III AUDIO.
26,446 Listeners
2,389 Listeners
7,910 Listeners
4,136 Listeners
87 Listeners
1,462 Listeners
9,095 Listeners
87 Listeners
389 Listeners
5,438 Listeners
15,220 Listeners
475 Listeners
121 Listeners
75 Listeners
461 Listeners