GLOVE: Global Verifier for LLM Memory-Environment Realignment
This addresses memory reliability for LLM-based agents in practical, non-stationary environments, representing a novel design dimension rather than an incremental improvement.
The paper tackles the problem of memory validity in LLMs under dynamic environmental drifts by proposing GLOVE, a framework that uses active probing to detect inconsistencies and realign memory without ground-truth supervision, resulting in substantial improvements in agent success rates on diverse benchmarks.
Most existing memory-enhanced Large Language Model (LLM) approaches implicitly assume that memory validity can be established either through external evaluators that provide task-specific success signals or through internal model cognition, such as reflection, for editing memory entries. However, these assumptions often break down in practical environments with dynamic drifts. We propose the Global Verifier (GLOVE), a framework that introduces a new design dimension for LLM memory systems by establishing a relative notion of truth. Through active probing to detect inconsistencies between retrieved memories and fresh observations, GLOVE enables memory-environment realignment by verifying and updating memory without access to ground-truth supervision or strong reliance on model introspection. We evaluate GLOVE on diverse benchmarks spanning web navigation, planning, and control, augmented with controlled environmental drifts that introduce non-stationarity beyond the original benchmark settings. Our results show that GLOVE substantially improves agent success rates, suggesting a robust pathway to cognitive agents capable of self-evolving.