February 01, 2023

AF - Trends in the dollar training cost of machine learning systems by Ben Cottier

7 minutes

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Trends in the dollar training cost of machine learning systems, published by Ben Cottier on February 1, 2023 on The AI Alignment Forum.

Summary

Using a dataset of 124 machine learning (ML) systems published between 2009 and 2022, I estimate that the cost of compute in US dollars for the final training run of ML systems has grown by 0.49 orders of magnitude (OOM) per year (90% CI: 0.37 to 0.56). See Table 1 for more detailed results, indicated by "All systems."

By contrast, I estimate that the cost of compute used to train "large-scale" systems since September 2015 (systems that used a relatively large amount of compute) has grown more slowly compared to the full sample, at a rate of 0.2 OOMs/year (90% CI: 0.1 to 0.4 OOMs/year). See Table 1 for more detailed results, indicated by "Large-scale."

Based on the historical trends, and reviewing some prior work (Lohn & Musser, 2022 and Cotra, 2020), I estimated my best guess for how quickly costs will grow in the future. Here, I'm assuming a model like the one used by Cotra (2020), where this growth rate is sustained up until spending hits a limit at some non-trivial fraction of gross world product. The below estimates are much less robust than the historical trends. (more)

My independent impression: 0.3 OOMs/year (90% CI: 0.1 to 0.4 OOMs/year)

My all-things-considered view: 0.2 OOMs/year (90% CI: 0.1 to 0.3 OOMs/year)

For future work, I recommend the following:

Incorporate systems trained on Google TPUs, and TPU price-performance data, into Method 2.

Estimate more reliable bounds on training compute costs, rather than just point estimates. For example, research the profit margin of NVIDIA and adjust retail prices by that margin to get a lower bound on hardware cost.

As a broader topic, investigate trends in investment, spending allocation, and AI revenue.

DataPeriodScale (start to end)Growth rate in dollar cost for final training runs(1) Using the overall GPU price-performance trend (go to results)All systems (n=124)$0.02 to $80KLarge-scale (n=25)$30K to $1M(2) Using the peak price-performance of the actual NVIDIA GPUs used to train ML systems (go to results)All systems (n=48)$0.10 to $80KLarge-scale (n=6)$200 to $70KWeighted mixture of growth ratesAll systemsN/A

Estimation method

(go to explanation)

Jun 2009–

Jul 2022

0.51 OOMs/year

90% CI: 0.45 to 0.57

Oct 2015–

Jun 2022

0.2 OOMs/year

90% CI: 0.1 to 0.4

Jun 2009–

Jul 2022

0.44 OOMs/year

90% CI: 0.34 to 0.52

Sep 2016–

May 2022

0.2 OOMs/year

90% CI: 0.1 to 0.4

Jun 2009–

Jul 2022

0.49 OOMs/year

90% CI: 0.37 to 0.56

Table 1: Estimated growth rate in the dollar cost of compute to train ML systems over time, based on a log-linear regression. OOM = order of magnitude (10x). See the section Summary of regression results for expanded result tables.

Figure 1: estimated cost of compute in US dollars for the final training run of ML systems. The costs here are estimated based on the trend in price-performance for all GPUs in Hobbhahn & Besiroglu (2022) (known as "Method 1" in this report).

Read the rest of the report here

These are "milestone" systems selected from the database Parameter, Compute and Data Trends in Machine Learning, using the same criteria as described in Sevilla et al. (2022, p.16): "All models in our dataset are mainly chosen from papers that meet a series of necessary criteria (has an explicit learning component, showcases experimental results, and advances the state-of-the-art) and at least one notability criterion (>1000 citations, historical importance, important SotA advance, or deployed in a notable context). For new models (from 2020 onward) it is harder to assess these criteria, so we fall back to a subjective selection. We refer to models meeting our selection criteria as milestone models."

This growth rate is about 0.2 OOM/year lower than the growth of t...

...more