Papers Read on AI

Self-attention Does Not Need O(n2) Memory


Listen Later

We provide a practical implementation for accelerators that requires O( √ n) memory, is numerically stable, and is within a few percent of the runtime of the standard implementation of attention. We also demonstrate how to differentiate the function while remaining memory-efficient.
2021: Markus N. Rabe, Charles Staats
https://arxiv.org/pdf/2112.05682v2.pdf
...more
View all episodesView all episodes
Download on the App Store

Papers Read on AIBy Rob

  • 3.7
  • 3.7
  • 3.7
  • 3.7
  • 3.7

3.7

3 ratings