Learning GenAI via SOTA Papers

EP085: Aya 23 Breaks The Curse Of Multilinguality


Listen Later

The technical report introduces Aya 23, a family of open-weight, multilingual instruction-tuned language models developed by Cohere For AI that support 23 languages.

Building on the previous Aya 101 model, which prioritized language breadth (covering 101 languages), Aya 23 focuses instead on an experiment in "depth versus breadth". By allocating more model capacity to fewer languages included during pre-training, Aya 23 effectively mitigates the well-documented "curse of multilinguality," a phenomenon where a model's performance on individual languages drops when it is forced to share capacity across too many languages.

Key highlights of the paper include:

  • Model Sizes: Aya 23 is released as open weights in two sizes: an 8-billion (8B) parameter model for best-in-class performance on consumer-grade hardware, and a 35-billion (35B) parameter model based on Cohere's Command R for highest-level performance.
  • Strong Performance: The Aya 23 models consistently outperform both the previous massively multilingual Aya 101 model and widely used open-weight models of similar sizes (such as Gemma, Mistral, and Mixtral).
  • Comprehensive Benchmark Gains: Aya 23 achieves significant improvements across a wide range of benchmarks, including up to a 14% improvement on discriminative tasks, a 20% improvement on generative tasks, and a 6.6x increase in multilingual mathematical reasoning compared to Aya 101.
  • Purpose: The initiative aims to combat the English-centric bias in natural language processing, bringing state-of-the-art language capabilities to approximately half of the global population while reducing the high latencies and performance cliffs experienced by non-English speakers.
...more
View all episodesView all episodes
Download on the App Store

Learning GenAI via SOTA PapersBy Yun Wu