CLAIMar 14

Memory-Driven Role-Playing: Evaluation and Enhancement of Persona Knowledge Utilization in LLMs

arXiv:2603.1931321.71 citationsh-index: 3
Predicted impact top 92% in CL · last 90 daysOriginality Incremental advance
AI Analysis

This work addresses the problem of inconsistent persona utilization in LLMs for role-playing applications, offering a novel paradigm and tools for evaluation and enhancement, though it is incremental in improving existing role-playing methods.

The paper tackles the challenge of maintaining consistent characterization in long, open-ended dialogues for LLM role-playing by proposing the Memory-Driven Role-Playing paradigm, which frames persona knowledge as internal memory and includes evaluation frameworks and prompting methods; experiments show that MRPrompt enables small models like Qwen3-8B to match the performance of larger closed-source models like Qwen3-Max and GLM-4.7.

A core challenge for faithful LLM role-playing is sustaining consistent characterization throughout long, open-ended dialogues, as models frequently fail to recall and accurately apply their designated persona knowledge without explicit cues. To tackle this, we propose the Memory-Driven Role-Playing paradigm. Inspired by Stanislavski's "emotional memory" acting theory, this paradigm frames persona knowledge as the LLM's internal memory store, requiring retrieval and application based solely on dialogue context, thereby providing a rigorous test of depth and autonomous use of knowledge. Centered on this paradigm, we contribute: (1) MREval, a fine-grained evaluation framework assessing four memory-driven abilities - Anchoring, Recalling, Bounding, and Enacting; (2) MRPrompt, a prompting architecture that guides structured memory retrieval and response generation; and (3) MRBench, a bilingual (Chinese/English) benchmark for fine-grained diagnosis. The novel paradigm provides a comprehensive diagnostic for four-staged role-playing abilities across 12 LLMs. Crucially, experiments show that MRPrompt allows small models (e.g., Qwen3-8B) to match the performance of much larger closed-source LLMs (e.g., Qwen3-Max and GLM-4.7), and confirms that upstream memory gains directly enhance downstream response quality, validating the staged theoretical foundation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes