The Introduction to GPT-3
—
Today's Amazon Deals - https://amzn.to/3FeoGyg
–
The Introduction to GPT-3
Launch date: 28th May, 2020
Right after another year of GPT- 2 launch, openAI came up with another updated and advanced version of GPT series, GPT - 3, “Language Models are Few-Shot Learners”. Open AI created the GPT-3 model with 175 billion parameters in its effort to create extremely robust and potent language models that would require little training and only a few demos to comprehend tasks and carry them out. This model featured 100 times more parameters than GPT-2 and ten times more than Microsoft’s potent Turing NLG language model. GPT-3 performs well on downstream NLP tasks in zero-shot and few-shot settings because of the numerous parameters and sizable dataset it was trained on. It may write articles that are difficult to differentiate from ones produced by people thanks to its huge capacity. It can also complete on-demand jobs that it was never expressly taught for, such as adding and subtracting numbers, generating SQL queries and codes, decoding sentences of words, writing React and JavaScript codes from a task description in natural language, etc.
Base Framework
With the text data they are trained on, large language models gain pattern detection and other abilities. The language models begin recognizing patterns in the data while they learn the core job of predicting the next word given context words, which helps them reduce the loss for the language modeling task. Eventually, the model benefits from this skill when transferring zero-shot tasks. The language model compares the pattern of the instances with what it has learned in the past for comparable data and utilizes that knowledge to carry out the tasks when given a few examples and/or a description of what needs to be done. This is a potent capacity of huge language models that gets stronger as the model’s parameter count rises.
Few, one, and zero-shot settings are specialized examples of zero-shot task transfer, as was previously stated. In a few-shot configuration, the job description and as many examples as will fit in the model’s context window are given to it. One example is given to the model in a one-shot setup, while none are given in a zero-shot configuration. The model’s few-shot, one-shot, and zero-shot capabilities all improve with increased capacity.
Figure 9.4: Image representing the context learning mechanism during training
[Source: GPT -3 paper]
Five distinct corpora were used to train the GPT-3, each with a specific weight. Good quality datasets were used to train the model over many epochs and were sampled more often. Common Crawl, WebText2, Books1, Books2, and Wikipedia were the five datasets used which included most of all the use case patterns of textual and contextual data.
Model Specifications
Again, like GPT-2, the model use in first GPT model with the transformer base but this version witnessed few major differences from GPT-2 which go like this:
GPT - 3 has been evaluated in 3 different in-context learning other than traditional fine-tuning with zero, one and few shot learning techniques.
GPT-3 has 96 layers with each layer having 96 attention heads.
Size of word embeddings was increased to 12888 for GPT-3 from 1600 for GPT-2.
Context window size was increased from 1024 for GPT-2 to 2048 tokens for GPT-3.
Adam optimiser was used with β_1=0.9, β_2=0.95 and ε= 10^(-8).
Alternating dense and locally banded sparse attention patterns were used.
Evaluation
A variety of language modeling and NLP datasets were used to test GPT-3. In a few or zero-shot situations, GPT-3 outperformed cutting-edge methods for language modeling datasets like LAMBADA and Penn Tree Bank. Although it couldn’t surpass the state-of-the-art for other datasets, it did enhance zero-shot state-of-the-art performance. On NLP tasks like closed book question answering, schema resolution, translation, etc., GPT-3 again performed well, frequently outperforming or coming close to well-tuned models.
Figure 9.5: Four methods for performing a task with a language model
[Source: GPT -3 paper]
The model performed better in few-shot settings than in one- and zero-shot settings for the majority of the tasks. A variety of language modeling and NLP datasets were used to test GPT-3. In a few or zero-shot situations, GPT-3 outperformed cutting-edge methods for language modeling datasets like LAMBADA and Penn Tree Bank. Although it couldn’t surpass the state-of-the-art for other datasets, it did enhance zero-shot state-of-the-art performance. On NLP tasks like closed book question answering, schema resolution, translation, etc. GPT-3 again performed well, frequently outperforming or coming close to well-tuned models. The model performed better in few-shot settings than in one- and zero-shot settings for the majority of the tasks. On the CoQA benchmark, 81.5 F1 in the zero-shot setting, 84.0 F1 in the one-shot setting, and 85.0 F1 in the few-shot setting, compared to the 90.7 F1 score achieved by fine-tuned SOTA. On the TriviaQA benchmark, 64.3%, 68.0%, 71.2% accuracy in the zero-shot setting, in the one-shot setting, and in the few-shot setting respectively, outperforming the state of the art (68%) by 3.2%. On the LAMBADA dataset, 76.2 %, 72.5%, 86.4% accuracy in the zero-shot setting, in the one-shot setting, and in the few-shot setting respectively, outperforming the state of the art (68%) by 18%. In addition to being assessed on traditional NLP tasks, the model was also evaluated on more artificial tasks, such as adding numbers, unscrambling words, creating news articles, learning and utilizing new terms, etc. The model performed better in the few-shot option than the one-shot and zero-shot settings for these tasks as well, with performance increasing with the number of parameters.
To learn more technical aspect of GPT - 3, you can refer to - Language Models are Few-Shot Learners- https://tinyurl.com/4ym9tehp
API Development of GPT - 3
In 2020 June, openAI released their API which offers a general-purpose “text in, text out” interface, allowing users to try it on essentially any English language job, in contrast to most AI systems that are developed for a single use-case. One may now request permission to use the API in your product, create a totally new application, or assist in researching the advantages and disadvantages of this technology.
The API will attempt to match the pattern you provided it with when given any text prompt and provide a text completion. It may be “programmed” by giving it a few samples of what you want it to accomplish; the degree of success varies typically depending on how difficult the task is. The API also enables you to improve performance on certain tasks by either learning from human input supplied by users or labelers or by training on a dataset (small or big) of samples you supply.
In September 2020, GPT-3 was integrated with Microsoft exclusively licensing the GPT-3, allowing us to leverage its technical innovations to develop and deliver advanced AI solutions for our customers, creating new potential AI solutions.
Figure 9.6: The process of input feeding in InstructGPT model or GPT 3.5
[Source: InstructGPT Paper]
At the end of 2021, OpenAI eventually made the entire GPT-3 available and its API available for all the users on public space in specified countries with an improved Playground, which makes it easy to prototype with our models, an example library with dozens of prompts to get developers started, and Codex, a new model that translates natural language into code.