DynSess: Dynamic Session-Level Evaluation and Optimization Framework for Role-Playing Agents
For researchers and developers of role-playing agents, this work provides a unified session-level approach to evaluation and training, improving long-horizon quality and efficiency.
The paper proposes DynSess, a session-level evaluation and optimization framework for role-playing agents, addressing the limitation of existing turn-level methods. The framework includes DynSess-Eval for scoring complete dialogue sessions and DynSess-Character models trained via session-level rewards, achieving human-aligned evaluation and matching state-of-the-art character models with fewer parameters.
Role-playing with large language models is fundamentally a session-level task, requiring agents to sustain character identity and interaction quality across extended multi-turn conversations. Yet existing evaluation and optimization methods remain largely turn-level, failing to capture long-horizon quality. We propose DynSess, a unified session-level framework for role-playing agents. DynSess-Eval scores complete dialogue sessions via rubrics targeting long-horizon behaviors. Leveraging its session-level rewards, we construct high-quality training trajectories through multi-turn lookahead search and train DynSess-Character with two complementary variants: DSPO (off-policy) and GSRPO (on-policy). Experiments show that DynSess-Eval aligns with human judgments substantially better than prior evaluators, and blind human evaluation further shows that DynSess-Character matches the strongest character model despite using substantially fewer parameters, while maintaining strong role consistency and interactive ability. Our dataset and code will be released to facilitate future research.