AI: post transformers

RetNet: Retentive Networks: Transformer Successor for Large Language Models


Listen Later

The August 9, 2023 paper introduces the **Retentive Network (RetNet)**, a proposed foundational architecture for large language models intended to succeed the **Transformer** model. RetNet aims to overcome the Transformer's inefficiencies during inference by simultaneously achieving **training parallelism**, **low-cost inference**, and **strong performance**, a combination previously considered an "impossible triangle." The core of RetNet is the **retention mechanism**, which supports three computation paradigms—**parallel, recurrent, and chunkwise recurrent**—to enable efficient training and constant-time, O(1) inference, leading to significant reductions in GPU memory, latency, and increased throughput compared to the Transformer. Experimental results across various model sizes and tasks demonstrate that RetNet is competitive in performance and offers superior efficiency in both training and deployment.


Source:

https://arxiv.org/pdf/2307.08621

...more
View all episodesView all episodes
Download on the App Store

AI: post transformersBy mcgrof