
Sign up to save your podcasts
Or


The paper introduces LLaMA, a collection of foundation language models ranging from 7 billion to 65 billion parameters developed by Meta AI. A major contribution of this work is its demonstration that state-of-the-art models can be trained exclusively using publicly available datasets, such as CommonCrawl, Wikipedia, and arXiv. This contrasts with most existing large language models (LLMs) that rely on undocumented or proprietary data, making LLaMA compatible with open-sourcing and helping to democratize access to LLM research.
The authors' primary objective was to achieve the best possible performance for various inference budgets, rather than focusing solely on the fastest training time. They found that while it might be cheaper to train a massive model to a certain performance level, a smaller model trained on significantly more data will ultimately be cheaper and more efficient during inference. Consequently, they trained their models on up to 1.4 trillion tokens.
The results show exceptional performance relative to model size:
By Yun WuThe paper introduces LLaMA, a collection of foundation language models ranging from 7 billion to 65 billion parameters developed by Meta AI. A major contribution of this work is its demonstration that state-of-the-art models can be trained exclusively using publicly available datasets, such as CommonCrawl, Wikipedia, and arXiv. This contrasts with most existing large language models (LLMs) that rely on undocumented or proprietary data, making LLaMA compatible with open-sourcing and helping to democratize access to LLM research.
The authors' primary objective was to achieve the best possible performance for various inference budgets, rather than focusing solely on the fastest training time. They found that while it might be cheaper to train a massive model to a certain performance level, a smaller model trained on significantly more data will ultimately be cheaper and more efficient during inference. Consequently, they trained their models on up to 1.4 trillion tokens.
The results show exceptional performance relative to model size: