Large Language Model (LLM) Talk

LLM Training


Listen Later

Training large language models (LLMs) is challenging due to the large amount of GPU memory and long training times required. Several parallelism paradigms enable model training across multiple GPUs, and various model architecture and memory-saving designs make it possible to train very large neural networks. The optimal model size and number of training tokens should be scaled equally, with a doubling of model size requiring a doubling of training tokens. Current large language models are significantly under-trained. Techniques such as data parallelism, model parallelism, pipeline parallelism, and tensor parallelism can be used to distribute the training workload. Other strategies include CPU offloading, activation recomputation, mixed-precision training, and compression to save memory.

...more
View all episodesView all episodes
Download on the App Store

Large Language Model (LLM) TalkBy AI-Talk

  • 4
  • 4
  • 4
  • 4
  • 4

4

4 ratings


More shows like Large Language Model (LLM) Talk

View all
The Real Python Podcast by Real Python

The Real Python Podcast

140 Listeners