AIMay 14

A Heterogeneous Temporal Memory Governance Framework for Long-Term LLM Persona Consistency

Zhao Yang, Wang Huan, Li Yingshuo, Tu Haomiao, Lin Hujite

arXiv:2605.148029.5

AI Analysis

For developers of conversational AI, ARPM provides a white-box, auditable approach to long-term persona consistency, though results are preliminary and based on engineering logs.

ARPM, an external temporal memory governance framework, maintains persona consistency in LLMs over long dialogues by separating static and dynamic memory and using retrieval-augmented verification. Under high noise (1:200+), manual review achieves 80% recall vs 44% for automatic CSV, and disabling dialogue history drops strict accuracy from 100% to 66.7%.

Large language models often suffer from fact loss, timeline confusion, persona drift, and reduced stability during long-range interaction, especially under high-noise knowledge bases, context clearing, and cross-model transfer. To address these issues, we introduce ARPM, an external temporal memory governance framework for long-term dialogue. ARPM separates static knowledge memory from dynamic dialogue experience memory and combines vector retrieval, BM25, RRF fusion, dual-temporal reranking, chronological evidence reading, and a controlled analysis protocol for evidence verification and answer binding. Unlike approaches that encode persona consistency into model weights or rely only on long context, ARPM treats continuity as a traceable, auditable, and transferable governance problem. Using engineering logs, we conduct three experiments. First, in a 50-round question-answering setting, we compare signal-to-noise ratios of 1:5 and 1:200+, and distinguish CSV auto-judgment from manual review. Under 1:5, CSV recall accuracy is 54.0%, while manual review raises it to 100.0%. Under 1:200+, the values are 44.0% and 80.0%. These results show that automatic rules can underestimate recall after supporting evidence enters the prompt. Second, ablation results show that dialogue history retrieval is necessary for recent continuity: disabling it reduces strict accuracy from 100% to 66.7%, and disabling BM25 reduces it to 80.0%, indicating that pure semantic retrieval is insufficient for correction and tracing. Third, under a 5.1-million-character noise substrate, periodic context clearing, and multi-model handoff, ARPM maintains semantic continuity, boundary continuity, and persona consistency, while exposing limits caused by weak protocol compliance. These findings show that long-term persona consistency can be decomposed into governable components and evaluated in a white-box manner.

View on arXiv PDF

Similar