Learning GenAI via SOTA Papers

EP083: How Meta Engineered the Llama 3 Herd


Listen Later

The paper presents Llama 3, a new family of foundation language models developed by Meta, featuring models with 8B, 70B, and a flagship 405B parameters. These models natively support multilinguality, coding, reasoning, and tool usage, with the 405B model capable of processing information in a context window of up to 128K tokens.

The development of Llama 3 focuses on optimizing data, scale, and complexity:

  • Pre-training: The models were pre-trained on a massive corpus of 15.6 trillion tokens, which is substantially larger and higher quality than the data used for Llama 2.
  • Post-training: The models underwent rigorous alignment using supervised finetuning (SFT), rejection sampling, and direct preference optimization (DPO) to better follow instructions and ensure helpfulness and harmlessness.

Extensive empirical and human evaluations demonstrate that the flagship 405B model performs on par with leading closed-source models like GPT-4 across a wide variety of tasks, while the 8B and 70B models establish best-in-class performance compared to alternative models of similar sizes.

The paper also highlights robust safety measures, including the release of Llama Guard 3 for system-level input and output safety. Finally, the authors detail ongoing, unreleased experiments integrating image, video, and speech capabilities into Llama 3 using a compositional approach, which has shown competitive results against state-of-the-art multimodal models.

...more
View all episodesView all episodes
Download on the App Store

Learning GenAI via SOTA PapersBy Yun Wu