TalkRL: The Reinforcement Learning Podcast

Arash Ahmadian on Rethinking RLHF


Listen Later

Arash Ahmadian is a Researcher at Cohere and Cohere For AI focussed on Preference Training of large language models. He’s also a researcher at the Vector Institute of AI.

Featured Reference

Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs

Arash Ahmadian, Chris Cremer, Matthias Gallé, Marzieh Fadaee, Julia Kreutzer, Olivier Pietquin, Ahmet Üstün, Sara Hooker


Additional References

  • Self-Rewarding Language Models, Yuan et al 2024
  • Reinforcement Learning: An Introduction, Sutton and Barto 1992
  • Learning from Delayed Rewards, Chris Watkins 1989
  • Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning, Williams 1992
...more
View all episodesView all episodes
Download on the App Store

TalkRL: The Reinforcement Learning PodcastBy Robin Ranjit Singh Chauhan

  • 4.9
  • 4.9
  • 4.9
  • 4.9
  • 4.9

4.9

29 ratings


More shows like TalkRL: The Reinforcement Learning Podcast

View all
The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch by Harry Stebbings

The Twenty Minute VC (20VC): Venture Capital | Startup Funding | The Pitch

529 Listeners

Conversations with Tyler by Mercatus Center at George Mason University

Conversations with Tyler

2,456 Listeners

The a16z Show by Andreessen Horowitz

The a16z Show

1,093 Listeners

Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

Super Data Science: ML & AI Podcast with Jon Krohn

302 Listeners

Practical AI by Practical AI LLC

Practical AI

203 Listeners

Google DeepMind: The Podcast by Hannah Fry

Google DeepMind: The Podcast

208 Listeners

Machine Learning Street Talk (MLST) by Machine Learning Street Talk (MLST)

Machine Learning Street Talk (MLST)

95 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

517 Listeners

Big Technology Podcast by Alex Kantrowitz

Big Technology Podcast

500 Listeners

No Priors: Artificial Intelligence | Technology | Startups by Conviction

No Priors: Artificial Intelligence | Technology | Startups

130 Listeners

This Day in AI Podcast by Michael Sharkey, Chris Sharkey

This Day in AI Podcast

228 Listeners

The AI Daily Brief: Artificial Intelligence News and Analysis by Nathaniel Whittemore

The AI Daily Brief: Artificial Intelligence News and Analysis

631 Listeners

"Econ 102" with Noah Smith and Erik Torenberg by Turpentine

"Econ 102" with Noah Smith and Erik Torenberg

150 Listeners

Training Data by Sequoia Capital

Training Data

42 Listeners

Uncapped with Jack Altman by Alt Capital

Uncapped with Jack Altman

43 Listeners