
Sign up to save your podcasts
Or


In this episode of Artificial Intelligence: Papers and Concepts, curated by Dr. Satya Mallick, we break down DeepMind's 2022 paper "Training Compute-Optimal Large Language Models"—the work that challenged the "bigger is always better" era of LLM scaling.
You'll learn why many famous models were under-trained, what it means to be compute-optimal, and why the best performance comes from scaling model size and training data together.
We also unpack the Chinchilla vs. Gopher showdown, why Chinchilla won with the same compute budget, and what this shift means for the future: data quality and curation may matter more than ever.
Resources:
Paper : Training Compute-Optimal Large Language Models https://arxiv.org/pdf/2203.15556
Need help building computer vision and AI solutions? https://bigvision.ai
Start a career in computer vision and AI https://opencv.org/university
By Dr. Satya MallickIn this episode of Artificial Intelligence: Papers and Concepts, curated by Dr. Satya Mallick, we break down DeepMind's 2022 paper "Training Compute-Optimal Large Language Models"—the work that challenged the "bigger is always better" era of LLM scaling.
You'll learn why many famous models were under-trained, what it means to be compute-optimal, and why the best performance comes from scaling model size and training data together.
We also unpack the Chinchilla vs. Gopher showdown, why Chinchilla won with the same compute budget, and what this shift means for the future: data quality and curation may matter more than ever.
Resources:
Paper : Training Compute-Optimal Large Language Models https://arxiv.org/pdf/2203.15556
Need help building computer vision and AI solutions? https://bigvision.ai
Start a career in computer vision and AI https://opencv.org/university