DLMar 12

Making Chant Computing Easy: CantusCorpus v1.0 and the PyCantus Library

arXiv:2603.11933v15.4h-index: 5
Predicted impact top 22% in DL · last 90 daysOriginality Synthesis-oriented
AI Analysis

This work addresses the need for computational tools in digital humanities, particularly for chant scholars and broader practitioners, by providing infrastructure to process large chant resources, though it is incremental as it builds on existing databases.

The authors tackled the problem of limited computational access to digital Gregorian chant data by compiling CantusCorpus v1.0, a dataset combining nearly 900,000 chants from multiple databases, and creating the PyCantus library to facilitate data integration and analysis, making chant research more transparent and accessible.

Digital Gregorian chant scholarship has for decades enjoyed the privilege of a large digital resource cataloguing chant sources: the Cantus ecosystem, with nearly 900,000 chants catalogued across more than 2000 sources. The Cantus Database data model and the Cantus ID mechanism has been adopted by 18 more chant databases, jointly accessible through the Cantus Index interface. However, this data has only been available piecemeal via the individual online user interfaces; computational methods have so far had only a limited opportunity to process these immense resources. To overcome this hurdle, we compiled CantusCorpus v1.0, a dataset that combines everything that was available across the Cantus Index-centered network of databases as of mid-2025, and we have also provided the code for updating the dataset as the databases grow. We then created the lightweight PyCantus library for working with this data. PyCantus decouples the data model from the Cantus codebase and thus allows integration of further chant data sources, which we illustrate with harmonising pilot data from the Corpus Monodicum project. Computational chant research is attractive - and CantusCorpus v1.0 and PyCantus are infrastructures that should make work in this field more transparent, replicable, and accessible to digital humanities practitioners beyond chant scholars themselves.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes