March 25, 2024

Arash Ahmadian on Rethinking RLHF

Listen Later

33 minutes

Arash Ahmadian is a Researcher at Cohere and Cohere For AI focussed on Preference Training of large language models. He’s also a researcher at the Vector Institute of AI.

Featured Reference

Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs

Arash Ahmadian, Chris Cremer, Matthias Gallé, Marzieh Fadaee, Julia Kreutzer, Olivier Pietquin, Ahmet Üstün, Sara Hooker

Additional References

Self-Rewarding Language Models, Yuan et al 2024
Reinforcement Learning: An Introduction, Sutton and Barto 1992
Learning from Delayed Rewards, Chris Watkins 1989
Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning, Williams 1992

...more

View all episodes

View all episodes

Download on the App Store

Download on the App Store

Get it on Google Play

TalkRL: The Reinforcement Learning Podcast

By Robin Ranjit Singh Chauhan

4.9

2929 ratings

March 25, 2024

Arash Ahmadian on Rethinking RLHF

Listen Later

33 minutes

Arash Ahmadian is a Researcher at Cohere and Cohere For AI focussed on Preference Training of large language models. He’s also a researcher at the Vector Institute of AI.

Featured Reference

Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs

Arash Ahmadian, Chris Cremer, Matthias Gallé, Marzieh Fadaee, Julia Kreutzer, Olivier Pietquin, Ahmet Üstün, Sara Hooker

Additional References

Self-Rewarding Language Models, Yuan et al 2024
Reinforcement Learning: An Introduction, Sutton and Barto 1992
Learning from Delayed Rewards, Chris Watkins 1989
Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning, Williams 1992

...more

More shows like TalkRL: The Reinforcement Learning Podcast

Planet Money by NPR

Planet Money

30,666 Listeners

Making Sense with Sam Harris by Sam Harris

Making Sense with Sam Harris

26,250 Listeners

Conversations with Tyler by Mercatus Center at George Mason University

Conversations with Tyler

2,447 Listeners

The a16z Show by Andreessen Horowitz

The a16z Show

1,093 Listeners

Super Data Science: ML & AI Podcast with Jon Krohn by Jon Krohn

Super Data Science: ML & AI Podcast with Jon Krohn

301 Listeners

Sean Carroll's Mindscape: Science, Society, Philosophy, Culture, Arts, and Ideas by Sean Carroll

Sean Carroll's Mindscape: Science, Society, Philosophy, Culture, Arts, and Ideas

4,170 Listeners

Practical AI by Daniel Whitenack and Chris Benson

Practical AI

208 Listeners

Google DeepMind: The Podcast by Hannah Fry

Google DeepMind: The Podcast

202 Listeners

All-In with Chamath, Jason, Sacks & Friedberg by All-In Podcast, LLC

All-In with Chamath, Jason, Sacks & Friedberg

10,182 Listeners

Machine Learning Street Talk (MLST) by Machine Learning Street Talk (MLST)

Machine Learning Street Talk (MLST)

99 Listeners

Dwarkesh Podcast by Dwarkesh Patel

Dwarkesh Podcast

576 Listeners

Hard Fork by The New York Times

Hard Fork

5,530 Listeners

No Priors: Artificial Intelligence | Technology | Startups by Conviction

No Priors: Artificial Intelligence | Technology | Startups

143 Listeners

Latent Space: The AI Engineer Podcast by Latent.Space

Latent Space: The AI Engineer Podcast

101 Listeners

The AI Daily Brief: Artificial Intelligence News and Analysis by Nathaniel Whittemore

The AI Daily Brief: Artificial Intelligence News and Analysis

682 Listeners