A Benchmark for Understanding and Generating Dialogue between Characters in Stories
This work addresses the challenge of character-aware dialogue processing in narratives, which is incremental as it builds on existing models with new tasks and data.
The paper tackles the problem of enabling machines to understand and generate dialogue in stories by proposing two new tasks and building the DialStory dataset of 105k Chinese stories, showing that their approach improves dialogue coherence and speaker recognition accuracy over baselines.
Many classical fairy tales, fiction, and screenplays leverage dialogue to advance story plots and establish characters. We present the first study to explore whether machines can understand and generate dialogue in stories, which requires capturing traits of different characters and the relationships between them. To this end, we propose two new tasks including Masked Dialogue Generation and Dialogue Speaker Recognition, i.e., generating missing dialogue turns and predicting speakers for specified dialogue turns, respectively. We build a new dataset DialStory, which consists of 105k Chinese stories with a large amount of dialogue weaved into the plots to support the evaluation. We show the difficulty of the proposed tasks by testing existing models with automatic and manual evaluation on DialStory. Furthermore, we propose to learn explicit character representations to improve performance on these tasks. Extensive experiments and case studies show that our approach can generate more coherent and informative dialogue, and achieve higher speaker recognition accuracy than strong baselines.