CLLGOct 27, 2023

T5 meets Tybalt: Author Attribution in Early Modern English Drama Using Large Language Models

arXiv:2310.18454v19 citationsh-index: 14
Originality Incremental advance
AI Analysis

This addresses authorship attribution for literary scholars, but it is incremental as it applies existing LLM methods to a specific domain with noted limitations.

The study tackled authorship identification in Early Modern English drama using large language models, finding that a fine-tuned T5-large model outperforms baselines like logistic regression and SVM for short passages but is prone to confident misattributions influenced by pre-training data.

Large language models have shown breakthrough potential in many NLP domains. Here we consider their use for stylometry, specifically authorship identification in Early Modern English drama. We find both promising and concerning results; LLMs are able to accurately predict the author of surprisingly short passages but are also prone to confidently misattribute texts to specific authors. A fine-tuned t5-large model outperforms all tested baselines, including logistic regression, SVM with a linear kernel, and cosine delta, at attributing small passages. However, we see indications that the presence of certain authors in the model's pre-training data affects predictive results in ways that are difficult to assess.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes