CLMar 3, 2025

Improving Retrospective Language Agents via Joint Policy Gradient Optimization

Xueyang Feng, Bo Lan, Quanyu Dai, Lei Wang, Jiakai Tang, Xu Chen, Zhenhua Dong, Ji-Rong Wen

arXiv:2503.01490v118.813 citationsh-index: 23Has CodeNAACL

Originality Incremental advance

AI Analysis

This addresses the problem of enabling continuous learning and self-improvement in language agents for AI research, though it appears incremental as it builds on existing fine-tuning and reinforcement learning methods.

The paper tackles the limitations of prompt-based and fine-tuned language agents by introducing RetroAct, a framework that jointly optimizes task-planning and self-reflective evolution, resulting in significant performance improvements for open-source models and reduced reliance on closed-source LLMs.

In recent research advancements within the community, large language models (LLMs) have sparked great interest in creating autonomous agents. However, current prompt-based agents often heavily rely on large-scale LLMs. Meanwhile, although fine-tuning methods significantly enhance the capabilities of smaller LLMs, the fine-tuned agents often lack the potential for self-reflection and self-improvement. To address these challenges, we introduce a novel agent framework named RetroAct, which is a framework that jointly optimizes both task-planning and self-reflective evolution capabilities in language agents. Specifically, we develop a two-stage joint optimization process that integrates imitation learning and reinforcement learning, and design an off-policy joint policy gradient optimization algorithm with imitation learning regularization to enhance the data efficiency and training stability in agent tasks. RetroAct significantly improves the performance of open-source models, reduces dependency on closed-source LLMs, and enables fine-tuned agents to learn and evolve continuously. We conduct extensive experiments across various testing environments, demonstrating RetroAct has substantial improvements in task performance and decision-making processes.

View on arXiv PDF

Similar