
Sign up to save your podcasts
Or
The paper introduces Length Controlled Policy Optimization (LCPO) for training reasoning models, enabling controlled output length and improved performance, outperforming existing methods while allowing for efficient compute allocation.
https://arxiv.org/abs//2503.04697
YouTube: https://www.youtube.com/@ArxivPapers
TikTok: https://www.tiktok.com/@arxiv_papers
Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016
Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers
5
33 ratings
The paper introduces Length Controlled Policy Optimization (LCPO) for training reasoning models, enabling controlled output length and improved performance, outperforming existing methods while allowing for efficient compute allocation.
https://arxiv.org/abs//2503.04697
YouTube: https://www.youtube.com/@ArxivPapers
TikTok: https://www.tiktok.com/@arxiv_papers
Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016
Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers
694 Listeners
197 Listeners
288 Listeners
77 Listeners
454 Listeners