Cognixia Podcast

What are Large Language Models?


Listen Later

Hello everyone and welcome back to the Cognixia podcast. To understand what large language models are, we first need to understand what transformer models are. As a human, we see the text as one word at a time and comprehend it accordingly, whereas machines see the text as just a bunch of characters. Machines were usually unable to interpret text like human beings can. However, this began changing when Vaswani et al published a paper establishing something called the transformer model. A transformer model is based on the attention mechanism, which enables the machine to read an entire sentence or even an entire paragraph instead of one character or one word at a time, and once the machine has consumed the entire input text, it will be capable of producing an output based on the input received. This enables the transformer model to understand the context of the input and deliver better outputs. These transformer models are the basis of many other models commonly used in machine learning and generative AI today. They process data by tokenizing the input and simultaneously conducting mathematical equations to.

Large Language Models are more advanced and complex versions of the transformer model in a way. A large language model is a deep learning algorithm that can perform a variety of natural language processing tasks. The large language models use transformer models and are trained using very, very large data sets. This is also why the models are called LARGE language models. Due to the wide training and powerful transformer models at the backbone, the large language models are equipped to recognize, translate, predict, or generate text or other content. From understanding protein structures to writing code, these large language models can be trained to do a very wide range of things.
But how are transformer models and other machine learning models able to predict text? According to a very influential and interesting paper by Claude Shannon titled “Prediction and Entropy of Printed English”, the English language has an entropy of 2.1 bits per letter, despite having 27 letters, that is, 26 alphabets and 1 space, hence, 27. If these letters were used randomly, the entropy would be about 4.8 bits per letter, which would make it easier for machine learning models, especially the transformer models to predict what would come next in a human language text. The models keep repeating this process again and again, creating entire paragraphs, word by word, that we then receive as an output.

Also, how does the transformer model or the machine learning model comprehend and deal with grammar? The model sees grammar as a pattern of how different words are used in a sentence or a context. It would be challenging for anyone to list out all the rules of grammar and then teach them to a machine-learning model. Instead, the models are programmed to acquire these grammar rules implicitly using examples. When the transformer model is large enough, as is the case for large language models, the model can be trained to learn a lot more than just the grammar rules, learning to extend these ideas beyond just the examples it has been trained on.


...more
View all episodesView all episodes
Download on the App Store

Cognixia PodcastBy Cognixia