
Sign up to save your podcasts
Or


Reinforcement learning through human feedback (RLHF) has come a long way. In this episode, research scientist Nathan Lambert talks to Jon Krohn about the technique’s origins of the technique. He also walks through other ways to fine-tune LLMs, and how he believes generative AI might democratize education.
This episode is brought to you by AWS Inferentia (go.aws/3zWS0au) and AWS Trainium (go.aws/3ycV6K0), and Crawlbase (crawlbase.com), the ultimate data crawling platform. Interested in sponsoring a SuperDataScience Podcast episode? Visit passionfroot.me/superdatascience for sponsorship information.
In this episode you will learn:
• Why it is important that AI is open [03:13]
• The efficacy and scalability of direct preference optimization [07:32]
• Robotics and LLMs [14:32]
• The challenges to aligning reward models with human preferences [23:00]
• How to make sure AI’s decision making on preferences reflect desirable behavior [28:52]
• Why Nathan believes AI is closer to alchemy than science [37:38]
Additional materials: www.superdatascience.com/791
By Jon Krohn4.6
294294 ratings
Reinforcement learning through human feedback (RLHF) has come a long way. In this episode, research scientist Nathan Lambert talks to Jon Krohn about the technique’s origins of the technique. He also walks through other ways to fine-tune LLMs, and how he believes generative AI might democratize education.
This episode is brought to you by AWS Inferentia (go.aws/3zWS0au) and AWS Trainium (go.aws/3ycV6K0), and Crawlbase (crawlbase.com), the ultimate data crawling platform. Interested in sponsoring a SuperDataScience Podcast episode? Visit passionfroot.me/superdatascience for sponsorship information.
In this episode you will learn:
• Why it is important that AI is open [03:13]
• The efficacy and scalability of direct preference optimization [07:32]
• Robotics and LLMs [14:32]
• The challenges to aligning reward models with human preferences [23:00]
• How to make sure AI’s decision making on preferences reflect desirable behavior [28:52]
• Why Nathan believes AI is closer to alchemy than science [37:38]
Additional materials: www.superdatascience.com/791

477 Listeners

588 Listeners

171 Listeners

434 Listeners

342 Listeners

146 Listeners

768 Listeners

268 Listeners

211 Listeners

141 Listeners

89 Listeners

131 Listeners

150 Listeners

209 Listeners

557 Listeners