
Sign up to save your podcasts
Or
Linear Transformers address the computational limitations of standard Transformer models, which have a quadratic complexity, O(n^2), with respect to input sequence length. Linear Transformers aim for linear complexity, O(n), making them suitable for longer sequences. They achieve this through methods such as low-rank approximations, local attention, or kernelized attention. Examples include Linformer (low-rank matrices), Longformer (sliding window attention), and Performer (kernelized attention). Efficient attention, a type of linear attention, interprets keys as template attention maps and aggregates values into global context vectors, thus differing from dot-product attention which synthesizes pixel-wise attention maps. This approach allows more efficient resource usage in domains with large inputs or tight constraints.
5
22 ratings
Linear Transformers address the computational limitations of standard Transformer models, which have a quadratic complexity, O(n^2), with respect to input sequence length. Linear Transformers aim for linear complexity, O(n), making them suitable for longer sequences. They achieve this through methods such as low-rank approximations, local attention, or kernelized attention. Examples include Linformer (low-rank matrices), Longformer (sliding window attention), and Performer (kernelized attention). Efficient attention, a type of linear attention, interprets keys as template attention maps and aggregates values into global context vectors, thus differing from dot-product attention which synthesizes pixel-wise attention maps. This approach allows more efficient resource usage in domains with large inputs or tight constraints.
272 Listeners
441 Listeners
298 Listeners
331 Listeners
217 Listeners
156 Listeners
192 Listeners
9,189 Listeners
417 Listeners
121 Listeners
75 Listeners
479 Listeners
94 Listeners
31 Listeners
43 Listeners