The Gist Talk

Performers: Linear Transformers with Orthogonal Random Features


Listen Later

The provided text introduces Performers, a novel class of Transformer architectures designed to overcome the quadratic time and space complexity limitations of traditional Transformers, which are often prohibitive for long sequences. Performers achieve linear complexity through a mechanism called Fast Attention Via positive Orthogonal Random features (FAVOR+). This approach offers a provably accurate estimation of the standard softmax full-rank attention without requiring priors like sparsity. The paper substantiates its claims with strong theoretical guarantees concerning estimation accuracy and variance reduction, particularly highlighting the necessity of positive random features over unstable trigonometric features. Experimental results confirm that Performers are efficient and effective across various large-scale tasks, including text and protein sequence modeling, often matching or surpassing the performance of other efficient attention methods

...more
View all episodesView all episodes
Download on the App Store

The Gist TalkBy kw