NELGApr 13, 2021

Multiple regression techniques for modeling dates of first performances of Shakespeare-era plays

arXiv:2104.05929v2
AI Analysis

This work addresses a specific problem in literary history for scholars by providing a computational method to date plays, but it is incremental as it builds on existing regression techniques.

The study tackled the problem of dating Shakespeare-era plays by applying 11 regression methods to predict performance dates based on word probabilities, introducing a memetic algorithm-based Continued Fraction Regression (CFR) that reduced dimensionality and improved interpretability.

The date of the first performance of a play of Shakespeare's time must usually be guessed with reference to multiple indirect external sources, or to some aspect of the content or style of the play. Identifying these dates is important to literary history and to accounts of developing authorial styles, such as Shakespeare's. In this study, we took a set of Shakespeare-era plays (181 plays from the period 1585--1610), added the best-guess dates for them from a standard reference work as metadata, and calculated a set of probabilities of individual words in these samples. We applied 11 regression methods to predict the dates of the plays at an 80/20 training/test split. We withdrew one play at a time, used the best-guess date metadata with the probabilities and weightings to infer its date, and thus built a model of date-probabilities interaction. We introduced a memetic algorithm-based Continued Fraction Regression (CFR) which delivered models using a small number of variables, leading to an interpretable model and reduced dimensionality. An in-depth analysis of the most commonly occurring 20 words in the CFR models in 100 independent runs helps explain the trends in linguistic and stylistic terms. The analysis with the subset of words revealed an interesting correlation of signature words with the Shakespeare-era play's genre.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes