December 09, 2019

98 - Analyzing Information Flow In Transformers, With Elena Voita

Listen Later

37 minutes

What function do the different attention heads serve in multi-headed attention models? In this episode, Lena describes how to use attribution methods to assess the importance and contribution of different heads in several tasks, and describes a gating mechanism to prune the number of effective heads used when combined with an auxiliary loss. Then, we discuss Lena’s work on studying the evolution of representations of individual tokens in transformers model.

Lena’s homepage:

https://lena-voita.github.io/

Blog posts:

https://lena-voita.github.io/posts/acl19_heads.html

https://lena-voita.github.io/posts/emnlp19_evolution.html

Papers:

https://arxiv.org/abs/1905.09418

https://arxiv.org/abs/1909.01380

...more

View all episodes

View all episodes

Download on the App Store

Download on the App Store

Get it on Google Play

NLP Highlights

By Allen Institute for Artificial Intelligence

4.3

2323 ratings

December 09, 2019

98 - Analyzing Information Flow In Transformers, With Elena Voita

Listen Later

37 minutes

What function do the different attention heads serve in multi-headed attention models? In this episode, Lena describes how to use attribution methods to assess the importance and contribution of different heads in several tasks, and describes a gating mechanism to prune the number of effective heads used when combined with an auxiliary loss. Then, we discuss Lena’s work on studying the evolution of representations of individual tokens in transformers model.

Lena’s homepage:

https://lena-voita.github.io/

Blog posts:

https://lena-voita.github.io/posts/acl19_heads.html

https://lena-voita.github.io/posts/emnlp19_evolution.html

Papers:

https://arxiv.org/abs/1905.09418

https://arxiv.org/abs/1909.01380

...more

More shows like NLP Highlights

Data Skeptic by Kyle Polich

Data Skeptic

480 Listeners

Up First from NPR by NPR

Up First from NPR

56,176 Listeners