AhbarjietMalta

Introduction of GPT - 3.5, InstructGPT


Listen Later

Introduction of GPT - 3.5, InstructGPT
Today's Amazon Deals - https://amzn.to/3FeoGyg
Introduction of GPT - 3.5, InstructGPT
One of the major issues that large language models used to face is like unfiltered AI- generated contents and responses sometimes which seem to be untruthful, toxic and irrelevant to the users. Thus, OpenAI integrated a fine-tuning with human-feedback taking stance which helps catering a wide range of tasks. This fine-tuned supervised model is trained with reinforcement learning of human feedback, which are referred as InstructGPT.
Base Framework
In InstructGPT, on the input prompt distribution, the labelers show examples of the intended behavior. These human prompts have tasks like generation, question answering, dialogue, summarization, extractions, and other natural language tasks and are majorly built on English language ( 96%). Almost 40 contractors were contributed towards human feedback and approximately 73% training labellers did synergize with each other.
Model Specifications
In the training part of instructGPT, the labelers were directed to use 3 kinds of prompts which included 1. Engage some arbitrary tasks 2. Multiple instructions and multiple queries 3. About certain corresponding solutions from random audiences from waitlisted users. And the training mechanism is made separate to train 3 different training model structures where in SFT models, datasets were trained with labellers demonstrations, likewise with rewards model and the dataset are adjusted with human interpretation of previous model output’s rankings; and the PPO models are completely fine-tuned without human interventions.
Supervised fine-tuning (SFT): In this model, the labeler data has been fed within the fine-tune mechanism for 16 epochs, using a cosine decay rate with a residual dropout 0.2.
Reward modeling (RM): The model has been trained to feed in a prompt response and get a scaler response. The difference in rewards represents the log odds that one response will be preferred to the other by a human label. In this structure they’ve trained approximately 6B RMs out of 175B
Reinforcement learning (RL): A random consumer request was presented in a bandit-style environment, and a response was expected. It generates a reward based on the prompt and answer, as defined by the reward model, and closes the episode. In order to prevent the reward model from being over optimized, they also applied a per-token KL penalty from the SFT model at each token. The RM was used to initialize the value function. These models were known as “PPO.”
Results
On the part of exploring more areas of developing the existing ecosystem of NLP models, openAI comes up with another fascinating development, which can resolve the problem of infilling. OpenAI wants to allow them to acquire excellent text infilling without compromising their ability to generate code normally from left to right. The team’s method for transforming training data is incredibly straightforward: they simply transfer a random section of text from the center to the end of a page.
The team shows that a causal AR LLM can learn to fill in the middle of a document and handle related tasks like inferring import modules, writing docstrings, and finishing functions by jointly training models on a mixture of FIM-transformed data and traditional left-to-right data on multiple objectives and datasets. Overall, the FIM models may retain the same left-to-right text capacity as standard AR models while learning how to more efficiently fill in the center – an advantage of the suggested training data transformation technique that provides FIM for free.
At 175B parameters (the davinci models, the most recent update), the InstructGPT model is preferred over GPT-3 more than 85% of the time and over GPT-3 prompted 71% of the time by human indications. This means that almost 3 out of 4 times, labelers prefer InstructGPT over a GPT-3 that has been conditioned to do well on the task at hand. Not even prompt engineering is enough to beat InstructGPT.
Figure 9.7: Evaluation of the final snapshots of models pretrained for 100B tokens without FIM and then fine-tuned for 25B (row a) and 50B (row b) tokens with FIM
[Source: InstructGPT paper]
To learn more technical aspect of GPT - 3.5, you can refer to - Training language models to follow instructions with human feedback- https://tinyurl.com/yny5uux2
Cost Reduction in GPT -3 Model API Tokens
Moving ahead with time and improvisation, chatGPT’s subscription model also witnessed a price reduction in GPT -3 series and especially in Da-Vinci model and curie model, 66% cost reduction - updated to $0.02 / 1k tokens and $0.002 / 1k tokens from $0.06 / 1k tokens and $0.006 / 1k tokens respectively. The OpenAI team kept on making amazing progress on making the model more efficient and more sustainable to lead to price reduction.
Introduction of Whisper
In the process of developing a better ecosystem of NLP domains, openAI came up with another Whisper, an automatic speech recognition which is trained on 6,80,000 hours of multilingual and multi task supervised scraping through the web. This model is designed to tackle the issue of background noise, data disturbance and making it closer to real estimation. This model also caters to a set of multi-linguistic tasks and gives out the transcripts as well. The multi-linguistic part has 98 different language data for the training purpose.
Overview of Whisper
The training dataset is made from diversified audio clips more biased towards the real life data to leverage more human-sided interpretations. The whisper AI is built on the architecture with taking mel-spectrogram of 30 secs chunks of sound wave and passing that into encoder-decoder Transformer to predict the relevant text caption, special tokens that instruct the single model to carry out tasks like language recognition, phrase-level timestamps, multilingual voice transcription, and to-English speech translation are combined in with the special tokens. It has 9 different model sizes according to size and capabilities.
Figure 9.8: The process of text processing through the training pipeline
[Source: Whisper paper]
Other current methods usually make use of larger but unsupervised audio pre-training datasets or smaller, more tightly linked audio-text training datasets. Whisper does not outperform models that specialize on LibriSpeech performance, a very competitive benchmark in speech recognition, because it was trained on a broad and varied dataset rather than being tailored to any particular one.
Figure 9.9: The encoder-decoder model of Whisper
[Source: Whisper paper]
However, it is far more reliable and commits 50% less mistakes than comparable models when we compare its zero-shot performance across a wide range of different datasets.
Whisper’s performance is close to that of professional human transcribers. The model has been tested with WER distributions of 25 recordings from the Kincaid46 dataset transcribed by Whisper, the same 4 commercial ASR systems from one computer-assisted human transcription service and 4 human transcription services and error ranges seemed to have almost similar ranges for all of them.
To learn more technical aspect of Whisper, you can refer to - Robust Speech Recognition via Large-Scale Weak Supervision- https://tinyurl.com/359y5t5y
Figure 9.10: The box plot is superimposed with dots indicating the WERs on individual recordings, and the aggregate WER over the 25 recordings are annotated on each box
[Source: Whisper paper]
...more
View all episodesView all episodes
Download on the App Store

AhbarjietMaltaBy AhbarjietMalta


More shows like AhbarjietMalta

View all
DJ AKD Remixes by Dj Akd

DJ AKD Remixes

2 Listeners