RAG-Modulo: Solving Sequential Tasks using Experience, Critics, and Language Models
This addresses the need for learning-based robotic systems to retain and learn from experiences, though it is incremental as it builds on existing LLM-based agents with memory enhancements.
The paper tackles the problem of LLM-based agents lacking memory for learning from past interactions in robotic tasks, proposing RAG-Modulo to incorporate memory and critics, resulting in significant improvements in task success rates and efficiency in BabyAI and AlfWorld domains.
Large language models (LLMs) have recently emerged as promising tools for solving challenging robotic tasks, even in the presence of action and observation uncertainties. Recent LLM-based decision-making methods (also referred to as LLM-based agents), when paired with appropriate critics, have demonstrated potential in solving complex, long-horizon tasks with relatively few interactions. However, most existing LLM-based agents lack the ability to retain and learn from past interactions - an essential trait of learning-based robotic systems. We propose RAG-Modulo, a framework that enhances LLM-based agents with a memory of past interactions and incorporates critics to evaluate the agents' decisions. The memory component allows the agent to automatically retrieve and incorporate relevant past experiences as in-context examples, providing context-aware feedback for more informed decision-making. Further by updating its memory, the agent improves its performance over time, thereby exhibiting learning. Through experiments in the challenging BabyAI and AlfWorld domains, we demonstrate significant improvements in task success rates and efficiency, showing that the proposed RAG-Modulo framework outperforms state-of-the-art baselines.