
Sign up to save your podcasts
Or


Direct vs. RL methods for preferences, more RLHF models, and hard truths in open RLHF work. We have more questions than answers.
Read the full post here.
By Nathan Lambert4.1
99 ratings
Direct vs. RL methods for preferences, more RLHF models, and hard truths in open RLHF work. We have more questions than answers.
Read the full post here.

538 Listeners

1,095 Listeners

292 Listeners

208 Listeners

202 Listeners

313 Listeners

99 Listeners

576 Listeners

143 Listeners

101 Listeners

226 Listeners

146 Listeners

490 Listeners

33 Listeners

39 Listeners