February 09, 2023

A Length-Extrapolatable Transformer

23 minutes

Position modeling plays a critical role in Transformers. In this paper, we focus on length extrapolation, i.e., training on short texts while evaluating longer sequences. We deﬁne attention resolution as an indicator of extrapolation. Then we propose two designs to improve the above metric of Transformers. Speciﬁcally, we introduce a relative position embedding to explicitly maximize attention resolution.

2022: Yutao Sun, Li Dong, Barun Patra, Shuming Ma, Shaohan Huang, Alon Benhaim, Vishrav Chaudhary, Xia Song, Furu Wei

https://arxiv.org/pdf/2212.10554v1.pdf

...more