
Sign up to save your podcasts
Or


The research paper introduces MEGABYTE, a novel multi-scale transformer architecture designed to efficiently process exceptionally long sequences, exceeding one million bytes. Unlike traditional transformers that struggle with long sequences due to quadratic self-attention costs and large feedforward layers, MEGABYTE segments data into "patches" and employs a local submodel within each patch and a global model between patches. This innovative approach significantly reduces computational complexity, allowing for larger models at a lower cost and improving generation speed. The paper presents extensive experiments demonstrating MEGABYTE's superior performance across various modalities, including long-context language modeling, high-resolution image generation, and raw audio modeling, often outperforming existing methods and establishing the viability of tokenization-free autoregressive sequence modeling at scale.
Source: 2023 - https://arxiv.org/pdf/2305.07185 - MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers
By mcgrofThe research paper introduces MEGABYTE, a novel multi-scale transformer architecture designed to efficiently process exceptionally long sequences, exceeding one million bytes. Unlike traditional transformers that struggle with long sequences due to quadratic self-attention costs and large feedforward layers, MEGABYTE segments data into "patches" and employs a local submodel within each patch and a global model between patches. This innovative approach significantly reduces computational complexity, allowing for larger models at a lower cost and improving generation speed. The paper presents extensive experiments demonstrating MEGABYTE's superior performance across various modalities, including long-context language modeling, high-resolution image generation, and raw audio modeling, often outperforming existing methods and establishing the viability of tokenization-free autoregressive sequence modeling at scale.
Source: 2023 - https://arxiv.org/pdf/2305.07185 - MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers