
Sign up to save your podcasts
Or
In this episode, Brandon Cui, Research Scientist at MosaicML and Databricks, dives into cutting-edge advancements in AI model optimization, focusing on Reward Models and Reinforcement Learning from Human Feedback (RLHF).
Highlights include:
- How synthetic data and RLHF enable fine-tuning models to generate preferred outcomes.
- Techniques like Policy Proximal Optimization (PPO) and Direct Preference
Optimization (DPO) for enhancing response quality.
- The role of reward models in improving coding, math, reasoning, and other NLP tasks.
Connect with Brandon Cui:
https://www.linkedin.com/in/bcui19/
5
1818 ratings
In this episode, Brandon Cui, Research Scientist at MosaicML and Databricks, dives into cutting-edge advancements in AI model optimization, focusing on Reward Models and Reinforcement Learning from Human Feedback (RLHF).
Highlights include:
- How synthetic data and RLHF enable fine-tuning models to generate preferred outcomes.
- Techniques like Policy Proximal Optimization (PPO) and Direct Preference
Optimization (DPO) for enhancing response quality.
- The role of reward models in improving coding, math, reasoning, and other NLP tasks.
Connect with Brandon Cui:
https://www.linkedin.com/in/bcui19/
152 Listeners
1,030 Listeners
40 Listeners
517 Listeners
621 Listeners
441 Listeners
297 Listeners
322 Listeners
140 Listeners
267 Listeners
121 Listeners
75 Listeners
459 Listeners
42 Listeners
53 Listeners