BaiJia: A Large-Scale Role-Playing Agent Corpus of Chinese Historical Characters
This work addresses the need for low-resource data to develop AI-driven historical role-playing agents, which is incremental as it compiles existing information into a structured corpus for LLMs.
The authors tackled the problem of fragmented historical textual records by introducing BaiJia, a large-scale corpus of Chinese historical characters, and demonstrated its effectiveness in enhancing the role-playing abilities of foundational LLMs, with experiments showing improved performance in historical role-playing tasks.
We introduce a comprehensive large-scale role-playing agent corpus, termed BaiJia, that comprises various Chinese historical characters. This corpus is noteworthy for being the pioneering compilation of low-resource data that can be utilized in large language models (LLMs) to engage in AI-driven historical role-playing agents. BaiJia addresses the challenges in terms of fragmented historical textual records in different forms and modalities, integrating various characters' information, including their biographical, literary, family relations, historical events, and so on. We conduct extensive experiments to demonstrate the effectiveness of our BaiJia agent corpus in bolstering the role-playing abilities of various foundational LLMs, and promoting the development and assessment of LLMs in the context of historical role-playing tasks. The agent corpus is available at baijia.online.