ERABAL: Enhancing Role-Playing Agents through Boundary-Aware Learning
This addresses role-onsistency issues in human-computer interaction applications, representing an incremental advancement in alignment training for large language models.
The paper tackles the problem of role-playing agents struggling with role-consistency in conversations, particularly for boundary queries, and presents ERABAL, a framework that achieves notable improvements on benchmarks like WikiRoleEval and CharacterEval while using significantly fewer dialogues than leading approaches.
Role-playing is an emerging application in the field of Human-Computer Interaction (HCI), primarily implemented through the alignment training of a large language model (LLM) with assigned characters. Despite significant progress, role-playing agents (RPLAs) still struggle with maintaining role-consistency across conversations, particularly when confronted with boundary queries subtly related to character attributes. In this paper, we present ERABAL, a framework aimed at enhancing RPLAs' role-playing capabilities through boundary-aware learning. ERABAL encompasses a generation pipeline for role-specific dialogues and a concomitant methodology for alignment training. Through comprehensive evaluations, we demonstrate that ERABAL is both efficient and effective. By training with significantly fewer dialogues than those used in leading approaches, ERABAL achieves notable improvements across WikiRoleEval, CharacterEval, and the role-playing subset of MT-Bench compared to the generalist baseline models. Our code and datasets will be made publicly available to support further research.