Replaces quadratic softmax attention in looped architectures with linear/sparse mechanisms for iterative memory refinement, achieving parity with standard looped transformers at much lower cost.
Replaces quadratic softmax attention in looped architectures with linear/sparse mechanisms for iterative memory refinement, achieving parity with standard looped transformers at much lower cost.