December 12, 2024

Longformer: The Long-Document Transformer

20 minutes

Ref: https://arxiv.org/abs/2004.05150

The

paper introduces Longformer, a Transformer model designed to

efficiently process long sequences. It addresses the quadratic

complexity of standard self-attention by using a linear-scaling

mechanism combining local windowed attention and task-motivated global

attention. The authors demonstrate Longformer's effectiveness on

character-level language modeling and various downstream tasks,

achieving state-of-the-art results. Furthermore, they introduce

Longformer-Encoder-Decoder (LED), a variant for sequence-to-sequence

tasks, showcasing its success in long document summarization. The

improved efficiency and performance are achieved through architectural

modifications and strategic training procedures.

...more

By KnowledgeDB