June 09, 2025

Deriving Phrase-Level Attention from BERT Models

20 minutes

We discuss methods for obtaining phrase and clause-level attention from BERT-based models, which primarily operate at the token level. They explain how standard BERT attention works and highlight the challenge of granularity when trying to understand relationships between larger semantic units. Various approaches are outlined, including aggregating existing token attention, adapting hierarchical attention networks, leveraging span-based or sparse attention mechanisms, and explicitly incorporating syntactic structure. The text emphasizes that deriving meaningful phrase-level attention often requires modifications to the model or reliance on external linguistic tools, and concludes by summarizing the pros and cons of different techniques and outlining future research directions.

...more

View all episodes

By Enoch H. Kang

June 09, 2025

Deriving Phrase-Level Attention from BERT Models

20 minutes

...more

Share Deriving Phrase-Level Attention from BERT Models

Sign up to save your podcasts

Deriving Phrase-Level Attention from BERT Models

Deriving Phrase-Level Attention from BERT Models