Data Science at Home

More powerful deep learning with transformers (Ep. 84) (Rebroadcast)

11.27.2019 - By Francesco GadaletaPlay

Download our free app to listen on your phone

Download on the App StoreGet it on Google Play

Some of the most powerful NLP models like BERT and GPT-2 have one thing in common: they all use the transformer architecture. Such architecture is built on top of another important concept already known to the community: self-attention.In this episode I explain what these mechanisms are, how they work and why they are so powerful.

Don't forget to subscribe to our Newsletter or join the discussion on our Discord server

 

References

Attention is all you need https://arxiv.org/abs/1706.03762

The illustrated transformer https://jalammar.github.io/illustrated-transformer

Self-attention for generative models http://web.stanford.edu/class/cs224n/slides/cs224n-2019-lecture14-transformers.pdf

More episodes from Data Science at Home