Learning GenAI via SOTA Papers

EP050: How Meta's LLaMA Beat GPT-3


Listen Later

The paper introduces LLaMA, a collection of foundation language models ranging from 7 billion to 65 billion parameters developed by Meta AI. A major contribution of this work is its demonstration that state-of-the-art models can be trained exclusively using publicly available datasets, such as CommonCrawl, Wikipedia, and arXiv. This contrasts with most existing large language models (LLMs) that rely on undocumented or proprietary data, making LLaMA compatible with open-sourcing and helping to democratize access to LLM research.

The authors' primary objective was to achieve the best possible performance for various inference budgets, rather than focusing solely on the fastest training time. They found that while it might be cheaper to train a massive model to a certain performance level, a smaller model trained on significantly more data will ultimately be cheaper and more efficient during inference. Consequently, they trained their models on up to 1.4 trillion tokens.

The results show exceptional performance relative to model size:

  • LLaMA-13B outperforms the much larger GPT-3 (175B parameters) on most benchmarks despite being 10 times smaller, allowing it to run efficiently on a single GPU.
  • LLaMA-65B is competitive with top-tier models like Chinchilla-70B and PaLM-540B across a wide range of tasks, including common sense reasoning, closed-book question answering, and reading comprehension.
  • Despite not being explicitly fine-tuned for code or mathematics, the models also show strong performance in code generation and mathematical reasoning when compared to other general models of similar sizes.
...more
View all episodesView all episodes
Download on the App Store

Learning GenAI via SOTA PapersBy Yun Wu