
Sign up to save your podcasts
Or
Demystifying attention, the key mechanism inside transformers and LLMs.
Demystifying self-attention, multiple heads, and cross-attention.
The first pass for the translated subtitles here is machine-generated, and therefore notably imperfect. To contribute edits or fixes, visit https://www.criblate.com
Звуковая дорожка на русском языке: Влад Бурмистров.
And yes, at 22:00 (and elsewhere), "breaks" is a typo.
Here are a few other relevant resources
Build a GPT from scratch, by Andrej Karpathy
If you want a conceptual understanding of language models from the ground up, @vcubingx just started a short series of videos on the topic:
If you're interested in the herculean task of interpreting what these large networks might actually be doing, the Transformer Circuits posts by Anthropic are great. In particular, it was only after reading one of these that I started thinking of the combination of the value and output matrices as being a combined low-rank map from the embedding space to itself, which, at least in my mind, made things much clearer than other sources.
Site with exercises related to ML programming and GPTs
History of language models by Brit Cruise, @ArtOfTheProblem
An early paper on how directions in embedding spaces have meaning:
Timestamps:
These animations are largely made using a custom Python library, manim. See the FAQ comments here:
All code for specific videos is visible here:
The music is by Vincent Rubinetti.
3blue1brown is a channel about animating math, in all senses of the word animate. If you're reading the bottom of a video description, I'm guessing you're more interested than the average viewer in lessons here. It would mean a lot to me if you chose to stay up to date on new ones, either by subscribing here on YouTube or otherwise following on whichever platform below you check most regularly.
Mailing list: https://3blue1brown.substack.com
Demystifying attention, the key mechanism inside transformers and LLMs.
Demystifying self-attention, multiple heads, and cross-attention.
The first pass for the translated subtitles here is machine-generated, and therefore notably imperfect. To contribute edits or fixes, visit https://www.criblate.com
Звуковая дорожка на русском языке: Влад Бурмистров.
And yes, at 22:00 (and elsewhere), "breaks" is a typo.
Here are a few other relevant resources
Build a GPT from scratch, by Andrej Karpathy
If you want a conceptual understanding of language models from the ground up, @vcubingx just started a short series of videos on the topic:
If you're interested in the herculean task of interpreting what these large networks might actually be doing, the Transformer Circuits posts by Anthropic are great. In particular, it was only after reading one of these that I started thinking of the combination of the value and output matrices as being a combined low-rank map from the embedding space to itself, which, at least in my mind, made things much clearer than other sources.
Site with exercises related to ML programming and GPTs
History of language models by Brit Cruise, @ArtOfTheProblem
An early paper on how directions in embedding spaces have meaning:
Timestamps:
These animations are largely made using a custom Python library, manim. See the FAQ comments here:
All code for specific videos is visible here:
The music is by Vincent Rubinetti.
3blue1brown is a channel about animating math, in all senses of the word animate. If you're reading the bottom of a video description, I'm guessing you're more interested than the average viewer in lessons here. It would mean a lot to me if you chose to stay up to date on new ones, either by subscribing here on YouTube or otherwise following on whichever platform below you check most regularly.
Mailing list: https://3blue1brown.substack.com