Pretrained

Why Your Agent is Cheating


Listen Later

Pierce and Richard are back for the second listener mailbag. They break down what reward hacking really is and why models so often learn the wrong lesson, explain practical fine-tuning (from pre-training to prompting), unpack why LLMs use tokens instead of words, how context length is a hardware versus mathematic limitation, and much more.

...more
View all episodesView all episodes
Download on the App Store

PretrainedBy Pierce Freeman & Richard Diehl Martinez