CLMay 20

Auto-Dreamer: Learning Offline Memory Consolidation for Language Agents

Chongrui Ye, Yuxiang Liu, Yu Wang, Haofei Yu, Yining Zhao, Ge Liu, Julian McAuley, Jiaxuan You

arXiv:2605.2061692.71 citations

Predicted impact top 14% in CL · last 90 daysOriginality Highly original

AI Analysis

For developers of language agents operating over multiple tasks, this work addresses the bottleneck of converting accumulated experience into reusable knowledge without retraining.

Auto-Dreamer introduces a learned offline memory consolidation mechanism for language agents, decoupling fast per-session acquisition from slow cross-session abstraction. It achieves 7-point improvement over baselines on ScienceWorld with 12× smaller memory, and generalizes to ALFWorld and WebArena with 6× less memory.

Language agents increasingly operate over streams of related tasks, yet existing memory systems struggle to convert accumulated experience into reusable knowledge. Retrieval-augmented and structured memory methods record per-session observations effectively, but often couple acquisition and consolidation into a single online process, leaving the agent without a global view across sessions to discover recurring patterns, abstract shared procedures, or prune redundant entries. Inspired by complementary learning systems theory, we propose Auto-Dreamer, a learned offline consolidator for language-agent memory. Auto-Dreamer decouples fast per-session memory acquisition from slow cross-session consolidation. Given a selected working region of a typed memory bank, the consolidator treats the region as read-only evidence, performs bounded tool-use to inspect entries and provenance-linked source trajectories, and synthesizes a fresh compact replacement set that abstracts across sessions and supersedes the original region. We train Auto-Dreamer via GRPO, using end-to-end agent performance as the reward signal to learn how to consolidate memories acquired through fast online experience. Trained on ScienceWorld trajectories alone, Auto-Dreamer outperforms fixed, RL-trained, and prompted memory baselines on ScienceWorld by 7 points while using an active memory bank 12$\times$ smaller than the strongest baseline, and continues to lead on held-out ALFWorld and WebArena without retraining -- using 6$\times$ less memory than the strongest baseline on ALFWorld.

View on arXiv PDF

Similar