
Sign up to save your podcasts
Or
The paper introduces Length Controlled Policy Optimization (LCPO) for training reasoning models, enabling controlled output length and improved performance, outperforming existing methods while allowing for efficient compute allocation.
https://arxiv.org/abs//2503.04697
YouTube: https://www.youtube.com/@ArxivPapers
TikTok: https://www.tiktok.com/@arxiv_papers
Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016
Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers
5
33 ratings
The paper introduces Length Controlled Policy Optimization (LCPO) for training reasoning models, enabling controlled output length and improved performance, outperforming existing methods while allowing for efficient compute allocation.
https://arxiv.org/abs//2503.04697
YouTube: https://www.youtube.com/@ArxivPapers
TikTok: https://www.tiktok.com/@arxiv_papers
Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016
Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers
696 Listeners
199 Listeners
289 Listeners
76 Listeners
441 Listeners