Bioinformatics and Classical Literary Study
This work addresses the challenge of quantitative literary analysis for classicists and researchers, representing an incremental application of existing scientific methods to a new domain.
The paper tackles the problem of analyzing literary phenomena like authorial style and intertextuality by applying computational biology and NLP methods, specifically using sequence alignment to detect inexact verbal similarities in Latin texts as a case study.
This paper describes the Quantitative Criticism Lab, a collaborative initiative between classicists, quantitative biologists, and computer scientists to apply ideas and methods drawn from the sciences to the study of literature. A core goal of the project is the use of computational biology, natural language processing, and machine learning techniques to investigate authorial style, intertextuality, and related phenomena of literary significance. As a case study in our approach, here we review the use of sequence alignment, a common technique in genomics and computational linguistics, to detect intertextuality in Latin literature. Sequence alignment is distinguished by its ability to find inexact verbal similarities, which makes it ideal for identifying phonetic echoes in large corpora of Latin texts. Although especially suited to Latin, sequence alignment in principle can be extended to many other languages.