The paper introduces Llama 2, a family of pretrained and fine-tuned large language models (LLMs) developed by Meta, ranging in scale from 7 billion to 70 billion parameters.
Here are the key highlights from the paper:
- Pretraining Improvements: Llama 2 was pretrained on 2 trillion tokens from a new mix of publicly available data. Compared to its predecessor (Llama 1), Llama 2 features a 40% larger pretraining corpus, double the context length (4096 tokens), and utilizes grouped-query attention (GQA) for better inference scalability in larger models.
- Llama 2-Chat: The authors specifically developed and released Llama 2-Chat, a version fine-tuned and optimized for dialogue use cases. This was achieved through Supervised Fine-Tuning (SFT) and iterative Reinforcement Learning with Human Feedback (RLHF), which included both rejection sampling and Proximal Policy Optimization (PPO).
- Novel Techniques: The researchers introduced Ghost Attention (GAtt), a method designed to help the model maintain system instructions and consistency across multiple turns of a conversation. They also observed emergent behaviors in the model, such as the ability to temporally organize knowledge and utilize external tools in a zero-shot context.
- Safety and Alignment: A major focus of the paper is the responsible development of LLMs. The team conducted extensive safety tuning using safety-specific RLHF, context distillation, and rigorous red-teaming exercises with experts to identify and mitigate risks like toxic language, bias, and harmful activities.
- Performance: According to extensive human evaluations and automated benchmarks, Llama 2-Chat outperforms existing open-source chat models in helpfulness and safety. Furthermore, it performs on par with several prominent closed-source models, such as ChatGPT and PaLM.
- Open Availability: The Llama 2 models are released openly for both research and commercial use to encourage collaboration, democratize access, and promote responsible AI innovation within the community.