CLNov 21, 2025

A Simple Yet Strong Baseline for Long-Term Conversational Memory of LLM Agents

arXiv:2511.17208v29 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses the challenge of long-term conversational memory for LLM agents, offering a practical solution that is incremental but improves upon existing methods.

The paper tackles the problem of maintaining coherent, personalized interactions in LLM-based conversational agents over many sessions by proposing an event-centric memory representation that preserves information in a non-compressive form, achieving results that match or surpass strong baselines on benchmarks like LoCoMo and LongMemEval$_S$ while using shorter QA contexts.

LLM-based conversational agents still struggle to maintain coherent, personalized interaction over many sessions: fixed context windows limit how much history can be kept in view, and most external memory approaches trade off between coarse retrieval over large chunks and fine-grained but fragmented views of the dialogue. Motivated by neo-Davidsonian event semantics, we propose an event-centric alternative that represents conversational history as short, event-like propositions which bundle together participants, temporal cues, and minimal local context, rather than as independent relation triples or opaque summaries. In contrast to work that aggressively compresses or forgets past content, our design aims to preserve information in a non-compressive form and make it more accessible, rather than more lossy. Concretely, we instruct an LLM to decompose each session into enriched elementary discourse units (EDUs) -- self-contained statements with normalized entities and source turn attributions -- and organize sessions, EDUs, and their arguments in a heterogeneous graph that supports associative recall. On top of this representation we build two simple retrieval-based variants that use dense similarity search and LLM filtering, with an optional graph-based propagation step to connect and aggregate evidence across related EDUs. Experiments on the LoCoMo and LongMemEval$_S$ benchmarks show that these event-centric memories match or surpass strong baselines, while operating with much shorter QA contexts. Our results suggest that structurally simple, event-level memory provides a principled and practical foundation for long-horizon conversational agents. Our code and data will be released at https://github.com/KevinSRR/EMem.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes