<p>Pierce and Richard are back for the second listener mailbag. They break down what reward hacking really is and why models so often learn the wrong lesson, explain practical fine-tuning (from pre-training to prompting), unpack why LLMs use tokens instead of words, how context length is a hardware versus mathematic limitation, and much more.</p>

Pierce and Richard are back for the second listener mailbag. They break down what reward hacking really is and why models so often learn the wrong lesson, explain practical fine-tuning (from pre-training to prompting), unpack why LLMs use tokens instead of words, how context length is a hardware versus mathematic limitation, and much more.

Why Your Agent is Cheating

10 years after studying at Stanford, two friends have somehow become AI experts. One builds startups, the other studies at Cambridge - together they break down LLMs and machine learning with zero BS and maximum banter.

Share Why Your Agent is Cheating

Sign up to save your podcasts

Why Your Agent is Cheating

Why Your Agent is Cheating