Distinguishing Fictional Voices: a Study of Authorship Verification Models for Quotation Attribution
This work addresses the challenge of improving automated speaker detection in literary texts for researchers in computational linguistics, but it is incremental as it shows mixed results and calls for further investigation.
The study tackled the problem of attributing quotes to characters in novels by exploring stylistic representations from authorship verification models, finding that while these models can distinguish characters, they do not consistently outperform semantic-only models in quote attribution.
Recent approaches to automatically detect the speaker of an utterance of direct speech often disregard general information about characters in favor of local information found in the context, such as surrounding mentions of entities. In this work, we explore stylistic representations of characters built by encoding their quotes with off-the-shelf pretrained Authorship Verification models in a large corpus of English novels (the Project Dialogism Novel Corpus). Results suggest that the combination of stylistic and topical information captured in some of these models accurately distinguish characters among each other, but does not necessarily improve over semantic-only models when attributing quotes. However, these results vary across novels and more investigation of stylometric models particularly tailored for literary texts and the study of characters should be conducted.