Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Rationale-Shaped Hole At The Heart Of Forecasting, published by dschwarz on April 2, 2024 on The Effective Altruism Forum.
Thanks to Eli Lifland, Molly Hickman, Değer Turan, and Evan Miyazono for reviewing drafts of this post. The opinions expressed here are my own.
Summary:
Forecasters produce reasons and models that are often more valuable than the final forecasts
Most of this value is being lost due to the historical practice & incentives of forecasting, and the difficulty of crowds to "adversarially collaborate"
FutureSearch is a forecasting system with legible reasons and models at its core (examples at the end)
The Curious Case of the Missing Reasoning
Ben Landau-Taylor of
Bismarck Analysis wrote a piece on March 6 called "
Probability Is Not A Substitute For Reasoning", citing a piece where he writes:
There has been a great deal of research on what criteria must be met for forecasting aggregations to be useful, and as Karger, Atanasov, and Tetlock
argue, predictions of events such as the arrival of AGI are a very long way from fulfilling them.
Last summer,
Tyler Cowen wrote on AGI ruin forecasts:
Publish, publish, not on blogs, not long stacked arguments or six hour podcasts or tweet storms, no, rather peer review, peer review, peer review, and yes with models too... if you wish to convince your audience of one of the most radical conclusions of all time…well, more is needed than just a lot of vertically stacked arguments.
Widely divergent views and forecasts on AGI persist, leading to FRI's excellent
adversarial collaboration on forecasting AI risk this month. Reading it, I saw… a lot of vertically stacked arguments.
There have been other big advances in judgmental forecasting recently, on non-AGI AI, Covid19 origins and scientific progress. How well justified are the forecasts?
Feb 28: Steinhardt's lab's impressive paper on
"Approaching Human-Level Forecasting with Language Models" (
press). The pipeline rephrases the question, lists arguments, ranks them, adjusts for biases, and then guesses the forecast. They note "The model can potentially generate weak arguments", and the appendix shows some good ones (decision trees) and some bad ones.
March 11: Good Judgment's
50-superforecast analysis of Covid-19 origins (
substack). Reports that the forecasters used base rates, scientific evidence, geopolitical context, and views from intelligence communities, but not what these were. (Conversely, the RootClaim debate gives so much info that even
Scott Alexander's summary is a dozen pages.) 10 of the 50 superforecasters ended with a dissenting belief.
March 18: Metaculus and Federation of American Scientists'
pilot of forecasting expected value of scientific projects. "[T]he research proposals lacked details about their research plans, what methods and experimental protocols would be used, and what preliminary research the author(s) had done so far. This hindered their ability to properly assess the technical feasibility of the proposals and their probability of success."
March 20: DeepMind's
"Evaluating Frontier Models for Dangerous Capabilities", featuring Swift Centre forecasts (
X). Reports forecaster themes: "Across all hypotheticals, there was substantial disagreement between individual forecasters." Lists a few cruxes but doesn't provide any complete arguments or models.
In these cases and the FRI collaboration, the forecasts are from top practitioners with great track records of accuracy (or "approaching" this, in the case of AI crowds). The questions are of the utmost importance.
Yet what can we learn from these? Dylan Matthews
wrote last month in Vox about "the tight connection between forecasting and building a model of the world." Where is this model of the world?
FRI's adversarial collaboration did the best here. They list several "cruxes", and measu...