AI: post transformers

MEGABYTE: Multiscale Transformers for Million-byte Sequences


Listen Later

The research paper introduces MEGABYTE, a novel multi-scale transformer architecture designed to efficiently process exceptionally long sequences, exceeding one million bytes. Unlike traditional transformers that struggle with long sequences due to quadratic self-attention costs and large feedforward layers, MEGABYTE segments data into "patches" and employs a local submodel within each patch and a global model between patches. This innovative approach significantly reduces computational complexity, allowing for larger models at a lower cost and improving generation speed. The paper presents extensive experiments demonstrating MEGABYTE's superior performance across various modalities, including long-context language modeling, high-resolution image generation, and raw audio modeling, often outperforming existing methods and establishing the viability of tokenization-free autoregressive sequence modeling at scale.


Source: 2023 - https://arxiv.org/pdf/2305.07185 - MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers

...more
View all episodesView all episodes
Download on the App Store

AI: post transformersBy mcgrof