LGSep 30, 2025

Memory-Driven Self-Improvement for Decision Making with Large Language Models

arXiv:2509.26340v11 citationsh-index: 12
Originality Incremental advance
AI Analysis

This work addresses the problem of inefficient adaptation of LLMs to specific decision-making tasks for researchers and practitioners in AI, representing an incremental advancement by integrating memory with LLM priors.

The paper tackles the challenge of adapting large language models (LLMs) to specific sequential decision-making tasks with limited data by proposing a memory-driven self-improvement framework that combines LLM general knowledge with domain-specific experiences, resulting in performance improvements of over 40% on in-distribution tasks and over 75% on unseen tasks in ALFWorld.

Large language models (LLMs) have emerged as effective action policies for sequential decision-making (SDM) tasks due to their extensive prior knowledge. However, this broad yet general knowledge is often insufficient for specific decision-making tasks with limited task-related data, making it challenging to efficiently adapt LLMs to specific SDM tasks. To address this challenge, we propose a memory-driven self-improvement framework that combines LLM general prior knowledge with a compact memory of domain-specific experiences. Memory retains past interactions and associated Q-values, thereby capturing decision-relevant knowledge that facilitates accurate value estimation and informs the LLM prior refinement. The refined LLM prior, in turn, generates higher-reward trajectories that further enrich memory, forming a natural self-improvement framework where memory and LLM prior mutually reinforce each other. Experiments show that our memory-driven approach significantly outperforms both traditional RL and LLM-based baselines, e.g., improving performance by over 40\% on in-distribution tasks and over 75\% when generalized to unseen tasks in ALFWorld.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes