In this episode:
• The Art and Science of Scaling RL: Professor Norris and Linda introduce today's topic, a new paper from Meta that aims to make training large models with reinforcement learning more predictable and scientific.
• More Art than Science: Linda explains why scaling Reinforcement Learning is so difficult compared to pre-training, highlighting the lack of predictive scaling laws and the immense compute costs that sideline smaller research groups.
• Not a Power Law, but a Sigmoid: The hosts discuss the paper's core proposal: using a sigmoidal curve to model performance. Linda breaks down the key parameters like asymptotic performance (A) and compute efficiency (B), while Professor Norris relates it to human learning curves.
• The ScaleRL Cookbook: Linda walks through the 'ScaleRL' recipe, a combination of techniques discovered through a massive 400,000 GPU-hour study. They discuss the difference between choices that raise the performance ceiling versus those that just improve efficiency.
• Predictable Progress and The Bitter Lesson: The hosts discuss the implications of this work, such as enabling cheaper, more accessible research by extrapolating from small-scale experiments, and how it reinforces the 'bitter lesson' of prioritizing scalable methods.
• Next Week on Mechanical Dreams: Professor Norris and Linda wrap up their discussion on scaling RL and give a brief teaser for the topic of next week's episode.