GraphLit: Learning Text-Enriched Dynamic Character Network Representations for Literary Study
For literary scholars and NLP researchers, this work provides a novel graph-based representation and learning method that captures both character interactions and textual context, enabling better performance on character-related tasks and new insights into narrative structure.
The paper introduces Dynamic Heterogeneous Character Networks (DHCNs) to represent literary texts as temporally localized graphs that align characters with textual context, and proposes GraphLit, a self-supervised learning framework using masked graph autoencoding. GraphLit improves over text-only and graph-only baselines across 12 character-related tasks, particularly those requiring contextual understanding.
Methods to represent literary texts as graphs or sequences of graphs mainly focus on representing character interactions, and often overlook another crucial aspect: the textual context in which characters interact. We introduce Dynamic Heterogeneous Character Networks (DHCNs), which organize long novels into temporally localized heterogeneous graphs that align characters with their textual contexts. We extract around 20,000 DHCNs from Project Gutenberg, and propose GraphLit, a self-supervised learning framework that learns rich literary representations through a masked graph autoencoder objective. Across a wide-range of 12 character-related tasks, GraphLit improves over text-only and graph-only baselines, particularly on tasks requiring contextual understanding. Finally, we demonstrate the applicability of DHCNs and GraphLit for literary analysis by studying the link between narrative non-linearity and dynamic social features.