The August 26, 2025 collaboration between the University of Washington, NVIDIA and the Allen Institute for AI paper introduces "SuperBPE: Space Travel for Language Models," introduces SuperBPE, a novel tokenization method that challenges the standard practice of limiting tokens to subword boundaries. The authors argue that conventional Byte-Pair Encoding (BPE) is inefficient because it cannot create "superword" tokens that bridge whitespace, ignoring common multi-word expressions that function as single semantic units. SuperBPE addresses this by incorporating a two-stage curriculum into BPE, first learning subwords and then learning superwords, resulting in up to 33% fewer tokens needed to encode text. Experiments with 8B transformer Language Models (LMs) demonstrate that models trained with SuperBPE achieve an average improvement of +4.0% across 30 downstream tasks and require 27% less compute at inference time compared to BPE baselines. The analysis suggests SuperBPE's success stems from creating more uniform per-token difficulty by capturing these cohesive multi-word expressions. Source: https://arxiv.org/pdf/2503.13423