Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: GPT-4 Predictions, published by Stephen McAleese on February 17, 2023 on LessWrong.
Introduction
GPT-4 is OpenAI’s next major language model which is expected to be released at some point in 2023. My goal here is to get some idea of when it will be released and what it will be capable of. I also think it will be interesting in retrospect to see how accurate my predictions were. This post is partially inspired by Mathew Barnett’s GPT-4 Twitter thread which I recommend reading.
Background of GPT models
GPT-1, GPT-2, GPT-3
GPT stands for generative pre-trained transformer and is a family of language models that were created by OpenAI. GPT was released in 2018, GPT-2 in 2019, and GPT-3 in 2020. All three models have used a similar architecture with some relatively minor variations: a dense, text-only, decoder transformer language model that’s trained using unsupervised learning to predict missing words in its text training set .
InstructGPT, GPT-3.5, ChatGPT
Arguably one of the biggest changes in the series in terms of architecture and behavior was the release of InstructGPT in January 2022 which used supervised fine-tuning using model answers and reinforcement learning with human feedback where model responses are ranked in addition to the standard unsupervised pre-training.
The GPT-3.5 models finished training and were released in 2022, and demonstrated better quality answers than GPT-3. In late 2022, OpenAI released ChatGPT which is based on GPT-3.5 and fine-tuned for conversation.
When will GPT-4 be released?
Sam Altman, the CEO of OpenAI, was interviewed by StrictlyVC in January 2023. When asked when GPT-4 would come out, he replied, “It will come out at some point when we are confident that we can do it safely and responsibly.”
Metaculus predicts a 50% chance that GPT-4 will be released by May 2023 and a ~93% chance that it will be released by the end of 2023.
It seems like there’s still quite a lot of uncertainty here but I think we can be quite confident that GPT-4 will be released at some point in 2023.
What will GPT-4 be like?
Altman revealed some more details about GPT-4 at an AC10 meetup Q&A. He said:
GPT-4 will be a text-only model like GPT-3.
GPT-4 won’t be much bigger than GPT-3 but will use much more compute and have much better performance.
GPT-4 will have a longer context window.
How capable will GPT-4 be?
Scaling laws
According to the paper Scaling Laws for Neural Language Models (2020), model performance as measured by cross-entropy loss can be calculated from three factors: the number of parameters in the model, the amount of compute used during training, and the amount of training data. There is a power-law relationship between these three factors and the loss. Basically, this means you have to increase the amount of compute, data, and parameters by a factor of 10 to decrease the loss by one unit, by 100 to decrease the loss by two units, and so on. The authors of the paper recommended training very large models on relatively small amounts of data and recommended investing compute into more parameters over more training steps or data to minimize loss as shown in this diagram:
For every 10x increase in compute, the paper approximately recommends increasing the number of parameters by 5x, the number of training tokens by 2x, and the number of serial training steps by 1.2x. This explains why the original GPT-3 model and other models such as Megatron and PaLM were so large.
However, the new scaling laws from DeepMind’s 2022 paper Training Compute Optimal Language Models instead emphasize the importance of training data for minimizing loss. Instead of prioritizing more parameters, the paper recommends scaling the number of parameters and training tokens equally.
DeepMind originally trained a large 280B parameter model named Gopher but then found a 70B mo...