March 13, 2026

EP120: How Reflexion agents learn through verbal feedback

20 minutes

Reflexion is a novel framework designed to improve Large Language Models (LLMs) acting as goal-driven agents by teaching them to learn from past mistakes.

Here is a short summary of the paper's key points:

The Problem: Traditional reinforcement learning methods require extensive training samples and expensive model fine-tuning, making it challenging for language agents to quickly and efficiently learn from trial-and-error.
The Solution: The authors propose "verbal reinforcement learning," where agents are reinforced through linguistic feedback rather than by updating the model's weights.
How it Works: The framework consists of three distinct models: an Actor (generates actions/text), an Evaluator (scores the outputs), and a Self-Reflection model (generates verbal reinforcement cues). The agent converts feedback from its environment into a textual summary of its mistakes, stores this in an episodic memory buffer, and uses it as a "semantic gradient" to plan better actions in future attempts.
Key Advantages: Reflexion is lightweight because it does not require fine-tuning the LLM. Furthermore, it allows for highly nuanced feedback and creates an explicit, interpretable episodic memory.
Results: Reflexion significantly outperforms baseline agents across diverse tasks, including a 22% improvement in sequential decision-making (AlfWorld) and a 20% improvement in reasoning (HotPotQA). Most notably, it achieved a 91% pass@1 accuracy on the HumanEval coding benchmark, surpassing the previous state-of-the-art GPT-4.

...more

View all episodes

By Yun Wu

March 13, 2026

EP120: How Reflexion agents learn through verbal feedback

20 minutes

Reflexion is a novel framework designed to improve Large Language Models (LLMs) acting as goal-driven agents by teaching them to learn from past mistakes.

Here is a short summary of the paper's key points:

The Problem: Traditional reinforcement learning methods require extensive training samples and expensive model fine-tuning, making it challenging for language agents to quickly and efficiently learn from trial-and-error.
The Solution: The authors propose "verbal reinforcement learning," where agents are reinforced through linguistic feedback rather than by updating the model's weights.
How it Works: The framework consists of three distinct models: an Actor (generates actions/text), an Evaluator (scores the outputs), and a Self-Reflection model (generates verbal reinforcement cues). The agent converts feedback from its environment into a textual summary of its mistakes, stores this in an episodic memory buffer, and uses it as a "semantic gradient" to plan better actions in future attempts.
Key Advantages: Reflexion is lightweight because it does not require fine-tuning the LLM. Furthermore, it allows for highly nuanced feedback and creates an explicit, interpretable episodic memory.
Results: Reflexion significantly outperforms baseline agents across diverse tasks, including a 22% improvement in sequential decision-making (AlfWorld) and a 20% improvement in reasoning (HotPotQA). Most notably, it achieved a 91% pass@1 accuracy on the HumanEval coding benchmark, surpassing the previous state-of-the-art GPT-4.

...more

Share EP120: How Reflexion agents learn through verbal feedback

Sign up to save your podcasts

EP120: How Reflexion agents learn through verbal feedback

EP120: How Reflexion agents learn through verbal feedback