NLP Highlights

98 - Analyzing Information Flow In Transformers, With Elena Voita

12.09.2019 - By Allen Institute for Artificial IntelligencePlay

Download our free app to listen on your phone

Download on the App StoreGet it on Google Play

What function do the different attention heads serve in multi-headed attention models? In this episode, Lena describes how to use attribution methods to assess the importance and contribution of different heads in several tasks, and describes a gating mechanism to prune the number of effective heads used when combined with an auxiliary loss. Then, we discuss Lena’s work on studying the evolution of representations of individual tokens in transformers model.

Lena’s homepage:

https://lena-voita.github.io/

Blog posts:

https://lena-voita.github.io/posts/acl19_heads.html

https://lena-voita.github.io/posts/emnlp19_evolution.html

Papers:

https://arxiv.org/abs/1905.09418

https://arxiv.org/abs/1909.01380

More episodes from NLP Highlights