AI Post Transformers

ELASTIC: Linear Attention for Sequential Interest Compression


Listen Later

The February 12, 2025 KuaiShou Inc paper introduces ELASTIC, an Efficient Linear Attention for SequenTial Interest Compression framework designed to address the scalability issues of traditional transformer-based sequential recommender systems, which suffer from quadratic complexity with respect to sequence length. ELASTIC achieves this by proposing a Linear Dispatcher Attention (LDA) layer that compresses long user behavior sequences into a more compact representation, leading to linear time complexity and significant reductions in GPU memory usage and increased inference speed. Furthermore, the framework incorporates an Interest Memory Retrieval (IMR) technique that uses a large, sparsely retrieved interest memory bank to expand the model's capacity and maintain recommendation accuracy despite the computational optimizations. Empirical results from experiments on datasets like ML-1M and XLong demonstrate that ELASTIC outperforms baseline methods while offering superior computational efficiency, especially when modeling long user sequences. Source: https://arxiv.org/pdf/2408.09380
...more
View all episodesView all episodes
Download on the App Store

AI Post TransformersBy mcgrof