CL LG SIFeb 26, 2025

A City of Millions: Mapping Literary Social Networks At Scale

Sil Hamilton, Rebecca M. M. Hicke, David Mimno, Matthew Wilkens

arXiv:2502.19590v217.612 citationsh-index: 12Has CodeProceedings of the 5th International Conference on Natural Language Processing for Digital Humanities

Originality Synthesis-oriented

AI Analysis

This provides a unique resource for humanities and social science research by offering large-scale data on historical social worlds, though it is incremental in automating existing methods.

The authors tackled the problem of extracting social networks from multilingual narratives at scale by automating previously manual methods, resulting in a dataset of 70,509 networks with 2.5 million individuals and 2.8 million relationships.

We release 70,509 high-quality social networks extracted from multilingual fiction and nonfiction narratives. We additionally provide metadata for $\sim$30,000 of these texts (73\% nonfiction and 27\% fiction) written between 1800 and 1999 in 58 languages. This dataset provides information on historical social worlds at an unprecedented scale, including data for 2,510,021 individuals in 2,805,482 pair-wise relationships annotated for affinity and relationship type. We achieve this scale by automating previously manual methods of extracting social networks; specifically, we adapt an existing annotation task as a language model prompt, ensuring consistency at scale with the use of structured output. This dataset serves as a unique resource for humanities and social science research by providing data on cognitive models of social realities.

View on arXiv PDF Code

Similar