CLAIMay 28

DynSess: Dynamic Session-Level Evaluation and Optimization Framework for Role-Playing Agents

arXiv:2605.2925664.0h-index: 8
AI Analysis

For researchers and developers of role-playing agents, this work provides a unified session-level approach to evaluation and training, improving long-horizon quality and efficiency.

The paper proposes DynSess, a session-level evaluation and optimization framework for role-playing agents, addressing the limitation of existing turn-level methods. The framework includes DynSess-Eval for scoring complete dialogue sessions and DynSess-Character models trained via session-level rewards, achieving human-aligned evaluation and matching state-of-the-art character models with fewer parameters.

Role-playing with large language models is fundamentally a session-level task, requiring agents to sustain character identity and interaction quality across extended multi-turn conversations. Yet existing evaluation and optimization methods remain largely turn-level, failing to capture long-horizon quality. We propose DynSess, a unified session-level framework for role-playing agents. DynSess-Eval scores complete dialogue sessions via rubrics targeting long-horizon behaviors. Leveraging its session-level rewards, we construct high-quality training trajectories through multi-turn lookahead search and train DynSess-Character with two complementary variants: DSPO (off-policy) and GSRPO (on-policy). Experiments show that DynSess-Eval aligns with human judgments substantially better than prior evaluators, and blind human evaluation further shows that DynSess-Character matches the strongest character model despite using substantially fewer parameters, while maintaining strong role consistency and interactive ability. Our dataset and code will be released to facilitate future research.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes