
Sign up to save your podcasts
Or
This research paper proposes a new method called Reinforcement Learning from Execution Feedback (RLEF) to improve the ability of large language models (LLMs) to generate code that successfully completes tasks. The authors demonstrate the effectiveness of RLEF by training LLMs on a challenging competitive programming benchmark called CodeContests. RLEF trains the models to iteratively generate code based on the feedback received from running their code against test cases. The results show that RLEF significantly improves solve rates and reduces the number of code samples needed compared to previous approaches, achieving state-of-the-art performance. The paper also investigates the inference-time behavior of RLEF-trained LLMs, highlighting their ability to effectively learn from feedback and make targeted improvements over multiple code generations.
This research paper proposes a new method called Reinforcement Learning from Execution Feedback (RLEF) to improve the ability of large language models (LLMs) to generate code that successfully completes tasks. The authors demonstrate the effectiveness of RLEF by training LLMs on a challenging competitive programming benchmark called CodeContests. RLEF trains the models to iteratively generate code based on the feedback received from running their code against test cases. The results show that RLEF significantly improves solve rates and reduces the number of code samples needed compared to previous approaches, achieving state-of-the-art performance. The paper also investigates the inference-time behavior of RLEF-trained LLMs, highlighting their ability to effectively learn from feedback and make targeted improvements over multiple code generations.