
Sign up to save your podcasts
Or


This paper discusses a framework for reflective test-time planning designed to improve the performance of embodied Large Language Models (LLMs) during robotic tasks. This system utilizes double-loop learning, where agents re-evaluate their past decisions through hindsight assessments to correct underlying strategic errors. By incorporating internal reflection for immediate scoring and retrospective reflection for long-term credit assignment, the model adapts its policy at deployment without requiring additional pretraining data. Experimental results in household and cupboard fitting tasks demonstrate that this approach significantly reduces execution waste and improves success rates compared to standard methods. Furthermore, the researchers employ Low-Rank Adaptation (LoRA) to efficiently update the models, ensuring that the robots can learn from their own trials and errors in real-time environments.
By Enoch H. KangThis paper discusses a framework for reflective test-time planning designed to improve the performance of embodied Large Language Models (LLMs) during robotic tasks. This system utilizes double-loop learning, where agents re-evaluate their past decisions through hindsight assessments to correct underlying strategic errors. By incorporating internal reflection for immediate scoring and retrospective reflection for long-term credit assignment, the model adapts its policy at deployment without requiring additional pretraining data. Experimental results in household and cupboard fitting tasks demonstrate that this approach significantly reduces execution waste and improves success rates compared to standard methods. Furthermore, the researchers employ Low-Rank Adaptation (LoRA) to efficiently update the models, ensuring that the robots can learn from their own trials and errors in real-time environments.