November 03, 2024

RLEF: GROUNDING CODE LLMS IN EXECUTION FEEDBACK WITH REINFORCEMENT LEARNING

24 minutes

This research paper proposes a new method called Reinforcement Learning from Execution Feedback (RLEF) to improve the ability of large language models (LLMs) to generate code that successfully completes tasks. The authors demonstrate the effectiveness of RLEF by training LLMs on a challenging competitive programming benchmark called CodeContests. RLEF trains the models to iteratively generate code based on the feedback received from running their code against test cases. The results show that RLEF significantly improves solve rates and reduces the number of code samples needed compared to previous approaches, achieving state-of-the-art performance. The paper also investigates the inference-time behavior of RLEF-trained LLMs, highlighting their ability to effectively learn from feedback and make targeted improvements over multiple code generations.

...more

View all episodes

By Kenpachi

November 03, 2024

RLEF: GROUNDING CODE LLMS IN EXECUTION FEEDBACK WITH REINFORCEMENT LEARNING

24 minutes

...more

Share RLEF: GROUNDING CODE LLMS IN EXECUTION FEEDBACK WITH REINFORCEMENT LEARNING

Sign up to save your podcasts

RLEF: GROUNDING CODE LLMS IN EXECUTION FEEDBACK WITH REINFORCEMENT LEARNING

RLEF: GROUNDING CODE LLMS IN EXECUTION FEEDBACK WITH REINFORCEMENT LEARNING