Share Performers: Linear Transformers with Orthogonal Random Features

Copy link

November 24, 2025

Performers: Linear Transformers with Orthogonal Random Features

37 minutes

The provided text introduces Performers, a novel class of Transformer architectures designed to overcome the quadratic time and space complexity limitations of traditional Transformers, which are often prohibitive for long sequences. Performers achieve linear complexity through a mechanism called Fast Attention Via positive Orthogonal Random features (FAVOR+). This approach offers a provably accurate estimation of the standard softmax full-rank attention without requiring priors like sparsity. The paper substantiates its claims with strong theoretical guarantees concerning estimation accuracy and variance reduction, particularly highlighting the necessity of positive random features over unstable trigonometric features. Experimental results confirm that Performers are efficient and effective across various large-scale tasks, including text and protein sequence modeling, often matching or surpassing the performance of other efficient attention methods

...more

View all episodes

By kw

November 24, 2025

Performers: Linear Transformers with Orthogonal Random Features

37 minutes

...more

Sign up to save your podcasts