CLMay 15, 2025

RAIDEN-R1: Improving Role-awareness of LLMs via GRPO with Verifiable Reward

Zongsheng Wang, Kaili Sun, Bowen Wu, Qun Yu, Ying Li, Baoxun Wang

arXiv:2505.10218v115.511 citationsh-index: 5

Originality Highly original

AI Analysis

This addresses role consistency for role-playing conversational agents, representing a strong specific gain in this domain.

The paper tackles the problem of role consistency in role-playing conversational agents by proposing RAIDEN-R1, a reinforcement learning framework with a verifiable role-awareness reward. The result shows that their 14B-GRPO model achieves 88.04% and 88.65% accuracy on Script-Based Knowledge and Conversation Memory metrics, outperforming baselines.

Role-playing conversational agents (RPCAs) face persistent challenges in maintaining role consistency. To address this, we propose RAIDEN-R1, a novel reinforcement learning framework that integrates Verifiable Role-Awareness Reward (VRAR). The method introduces both singular and multi-term mining strategies to generate quantifiable rewards by assessing role-specific keys. Additionally, we construct a high-quality, role-aware Chain-of-Thought dataset through multi-LLM collaboration, and implement experiments to enhance reasoning coherence. Experiments on the RAIDEN benchmark demonstrate RAIDEN-R1's superiority: our 14B-GRPO model achieves 88.04% and 88.65% accuracy on Script-Based Knowledge and Conversation Memory metrics, respectively, outperforming baseline models while maintaining robustness. Case analyses further reveal the model's enhanced ability to resolve conflicting contextual cues and sustain first-person narrative consistency. This work bridges the non-quantifiability gap in RPCA training and provides insights into role-aware reasoning patterns, advancing the development of RPCAs.

View on arXiv PDF

Similar