CLMay 3

Methods, Data, and Conceptual Change: Reflections from Two Quantitative Diachronic Case Studies

arXiv:2605.0205264.2
AI Analysis

For historical linguists, this paper provides a methodological critique highlighting how dataset properties constrain quantitative analyses of semantic change.

This paper examines how quantitative methods for studying semantic change in historical linguistics are shaped by dataset properties, using two case studies on Early Modern English and scientific writing. It argues that comparative methodological reflection reveals the limits of frequency-based approaches and shows how dataset structure affects the detection of semantic change.

This discussion paper reflects on how quantitative approaches to historical linguistics interact with dataset properties. Drawing on two worked examples, we examine English data using quad-based concept modelling of Early Modern English discourse in EEBO-TCP (c. 1470s-1690s; 765M words) alongside SynFlow analysis of scientific writing in Royal Society Corpus 6.0.4 (1750-1799; drawn from a 78.6M-token open corpus). Through parallel comparison, the paper explores how each approach operationalises concepts, the data assumptions they entail, and the diachronic interpretations they support. We argue that comparative methodological reflection clarifies the limits of purely lexical, frequency-based approaches and highlights how dataset structure shapes the kinds of semantic change that quantitative methods can reliably detect.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes