CLNov 6, 2023

STONYBOOK: A System and Resource for Large-Scale Analysis of Novels

arXiv:2311.03614v11 citationsh-index: 57Has Code
Originality Synthesis-oriented
AI Analysis

This provides a resource for researchers in digital humanities and NLP to analyze literary works at scale, though it is incremental as it builds on existing annotation and analysis methods.

The authors tackled the problem of large-scale analysis of novels by developing an open-source NLP pipeline, a collection of 49,207 annotated novels, and a database with a web interface for aggregate analysis, enabling tasks like character interaction visualizations and readability metrics.

Books have historically been the primary mechanism through which narratives are transmitted. We have developed a collection of resources for the large-scale analysis of novels, including: (1) an open source end-to-end NLP analysis pipeline for the annotation of novels into a standard XML format, (2) a collection of 49,207 distinct cleaned and annotated novels, and (3) a database with an associated web interface for the large-scale aggregate analysis of these literary works. We describe the major functionalities provided in the annotation system along with their utilities. We present samples of analysis artifacts from our website, such as visualizations of character occurrences and interactions, similar books, representative vocabulary, part of speech statistics, and readability metrics. We also describe the use of the annotated format in qualitative and quantitative analysis across large corpora of novels.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes