
Sign up to save your podcasts
Or
Breaking down how Large Language Models work
Here are a few other relevant resources
Build a GPT from scratch, by Andrej Karpathy
If you want a conceptual understanding of language models from the ground up, @vcubingx just started a short series of videos on the topic:
If you're interested in the herculean task of interpreting what these large networks might actually be doing, the Transformer Circuits posts by Anthropic are great. In particular, it was only after reading one of these that I started thinking of the combination of the value and output matrices as being a combined low-rank map from the embedding space to itself, which, at least in my mind, made things much clearer than other sources.
History of language models by Brit Cruise, @ArtOfTheProblem
An early paper on how directions in embedding spaces have meaning:
Звуковая дорожка на русском языке: Влад Бурмистров.
Timestamps
0:00 - Predict, sample, repeat
Breaking down how Large Language Models work
Here are a few other relevant resources
Build a GPT from scratch, by Andrej Karpathy
If you want a conceptual understanding of language models from the ground up, @vcubingx just started a short series of videos on the topic:
If you're interested in the herculean task of interpreting what these large networks might actually be doing, the Transformer Circuits posts by Anthropic are great. In particular, it was only after reading one of these that I started thinking of the combination of the value and output matrices as being a combined low-rank map from the embedding space to itself, which, at least in my mind, made things much clearer than other sources.
History of language models by Brit Cruise, @ArtOfTheProblem
An early paper on how directions in embedding spaces have meaning:
Звуковая дорожка на русском языке: Влад Бурмистров.
Timestamps
0:00 - Predict, sample, repeat