CLLGSIFeb 26, 2025

A City of Millions: Mapping Literary Social Networks At Scale

arXiv:2502.19590v212 citationsh-index: 12Proceedings of the 5th International Conference on Natural Language Processing for Digital Humanities
Originality Synthesis-oriented
AI Analysis

This provides a unique resource for humanities and social science research by offering large-scale data on historical social worlds, though it is incremental in automating existing methods.

The authors tackled the problem of extracting social networks from multilingual narratives at scale by automating previously manual methods, resulting in a dataset of 70,509 networks with 2.5 million individuals and 2.8 million relationships.

We release 70,509 high-quality social networks extracted from multilingual fiction and nonfiction narratives. We additionally provide metadata for $\sim$30,000 of these texts (73\% nonfiction and 27\% fiction) written between 1800 and 1999 in 58 languages. This dataset provides information on historical social worlds at an unprecedented scale, including data for 2,510,021 individuals in 2,805,482 pair-wise relationships annotated for affinity and relationship type. We achieve this scale by automating previously manual methods of extracting social networks; specifically, we adapt an existing annotation task as a language model prompt, ensuring consistency at scale with the use of structured output. This dataset serves as a unique resource for humanities and social science research by providing data on cognitive models of social realities.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes