
Sign up to save your podcasts
Or


Ever wondered how much information your favorite AI language models, like GPT, actually retain from their training data? In this episode of AI Odyssey, we delve into groundbreaking research by John X. Morris, Chawin Sitawarin, Chuan Guo, Narine Kokhlikyan, G. Edward Suh, Alexander M. Rush, Kamalika Chaudhuri, and Saeed Mahloujifar. The authors introduce a new method for quantifying memorization in AI, distinguishing between unintended memorization (dataset-specific information) and generalization (knowledge of underlying data patterns). With findings revealing that models like GPT have a surprising capacity of about 3.6 bits per parameter, this study explores how memorization plateaus and eventually gives way to true understanding, a phenomenon known as "grokking."
Created using Google's NotebookLM, this episode demystifies how language models balance memorization and generalization, offering fresh insights into model training and privacy implications.
Dive deeper into the full paper here: https://www.arxiv.org/abs/2505.24832
 By Anlie Arnaudy, Daniel Herbera and Guillaume Fournier
By Anlie Arnaudy, Daniel Herbera and Guillaume FournierEver wondered how much information your favorite AI language models, like GPT, actually retain from their training data? In this episode of AI Odyssey, we delve into groundbreaking research by John X. Morris, Chawin Sitawarin, Chuan Guo, Narine Kokhlikyan, G. Edward Suh, Alexander M. Rush, Kamalika Chaudhuri, and Saeed Mahloujifar. The authors introduce a new method for quantifying memorization in AI, distinguishing between unintended memorization (dataset-specific information) and generalization (knowledge of underlying data patterns). With findings revealing that models like GPT have a surprising capacity of about 3.6 bits per parameter, this study explores how memorization plateaus and eventually gives way to true understanding, a phenomenon known as "grokking."
Created using Google's NotebookLM, this episode demystifies how language models balance memorization and generalization, offering fresh insights into model training and privacy implications.
Dive deeper into the full paper here: https://www.arxiv.org/abs/2505.24832