Hey guys, in this episode I explain most of what I know about Transformers. I talk about the architecture, the attention formula, encoder, decoder, self-supervised learning, positional encoding, tokenization, inductive bias, Vision-Transformers, receptive fields...
It was the most technical episode I've recorded so far, and I hope you like it! By the way, it worth listening to this episode with the Transformers paper.