Reflexion is a novel framework designed to improve Large Language Models (LLMs) acting as goal-driven agents by teaching them to learn from past mistakes.
Here is a short summary of the paper's key points:
- The Problem: Traditional reinforcement learning methods require extensive training samples and expensive model fine-tuning, making it challenging for language agents to quickly and efficiently learn from trial-and-error.
- The Solution: The authors propose "verbal reinforcement learning," where agents are reinforced through linguistic feedback rather than by updating the model's weights.
- How it Works: The framework consists of three distinct models: an Actor (generates actions/text), an Evaluator (scores the outputs), and a Self-Reflection model (generates verbal reinforcement cues). The agent converts feedback from its environment into a textual summary of its mistakes, stores this in an episodic memory buffer, and uses it as a "semantic gradient" to plan better actions in future attempts.
- Key Advantages: Reflexion is lightweight because it does not require fine-tuning the LLM. Furthermore, it allows for highly nuanced feedback and creates an explicit, interpretable episodic memory.
- Results: Reflexion significantly outperforms baseline agents across diverse tasks, including a 22% improvement in sequential decision-making (AlfWorld) and a 20% improvement in reasoning (HotPotQA). Most notably, it achieved a 91% pass@1 accuracy on the HumanEval coding benchmark, surpassing the previous state-of-the-art GPT-4.