Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Discounts in cost-effectiveness analyses [Founders Pledge], published by Rosie Bettle on August 16, 2023 on The Effective Altruism Forum.
Replicability and Generalisability
This report aims to provide guidelines for producing discounts within cost-effectiveness analyses; how to take an effect size from an RCT and apply discounts to best predict the real-world effect of an intervention. The goal is to have guidelines that produce accurate estimates, and are practical for researchers to use. These guidelines are likely to be updated further, and I especially invite suggestions and criticism for the purpose of further improvement. A google docs version of this report is available here.
Acknowledgements: I would like to thank Matt Lerner, Filip Murár, David Reinstein and James Snowden for helpful comments on this report. I would also like to thank the rest of the FP research team for helpful comments during a presentation on this report.
Summary
This document provides guidelines for estimating the discounts that we (Founders Pledge) apply to RCTs in our cost-effectiveness analyses for global health and development charities. To skip directly to these guidelines, go to the 'Guidance for researchers' sections (here, here and here; separated by each type of discount).
I think that we should separate out discounts into internal reliability and external validity adjustments, because these components have different causes (see Fig 1.)
For internal reliability (degree to which the study accurately assesses the intervention in the specific context of the study- aka if an exact replication of the study was carried out, would we see the same effect?);
All RCTs will need a Type M adjustment; an adjustment that corrects for potential inflation of effect size (Type M error). The RCTs that are likely to have the most inflated effect sizes are those that are low powered (where the statistical test used has only a small chance of successfully detecting an effect, see more info here), especially if they are providing some of the first evidence for the effect. Factors to account for include publication bias, researcher bias (e.g. motivated reasoning to find an exciting result; running a variety of statistical tests and only reporting the ones that reach statistical significance would be an example of this), and methodological errors (e.g. inadequate randomisation of test trial subjects). See here for guidelines, and here to assess power.
Many RCTs are likely to need a 50-60% Type M discount, but there is a lot of variation here; table 1 can help to sense-check Type M adjustments.
A small number (<~15%) of RCTs will need a Type S adjustment, to account for the possibility that the sign of the effect is in the wrong direction. This is for RCTs that are producing some of the first evidence for an effect, are underpowered, and where it is mechanistically plausible that the effect could go in the other direction. See here for guidelines.
The likelihood of Type S error can be estimated mathematically (e.g. via the retrodesign R package).
For external validity (how that result generalises to a different context, e.g. when using an RCT effect size to estimate how well an intervention will work in a different area), we should expect that the absolute median effect size will vary between different contexts by around 99% (so a 1X effect size could be as little as ~0X or ~2X in a different context; see Vivalt 2020). Note that the effect in the new context could be larger than our estimate of the true effect, but I expect that in most cases it will be smaller. Following Duflo & Banerjee (2017), we should attend to;
Specific sample differences (do the conditions necessary for the intervention to work hold in the new context?)
Equilibrium effects (will there be emergent effects of the intervention, when...