CLMay 17, 2022

Letters From the Past: Modeling Historical Sound Change Through Diachronic Character Embeddings

arXiv:2205.08256v1639 citationsh-index: 17
Originality Incremental advance
AI Analysis

This addresses the problem of modeling historical sound change for linguists and NLP researchers, offering a novel computational approach to a less-studied aspect of language change.

The paper tackled the detection of sound change in historical languages by using PPMI character embeddings to model changes through time, showing that the method could identify several known sound shifts like lenition of plosives in Danish and uncover meaningful contexts.

While a great deal of work has been done on NLP approaches to lexical semantic change detection, other aspects of language change have received less attention from the NLP community. In this paper, we address the detection of sound change through historical spelling. We propose that a sound change can be captured by comparing the relative distance through time between their distributions using PPMI character embeddings. We verify this hypothesis in synthetic data and then test the method's ability to trace the well-known historical change of lenition of plosives in Danish historical sources. We show that the models are able to identify several of the changes under consideration and to uncover meaningful contexts in which they appeared. The methodology has the potential to contribute to the study of open questions such as the relative chronology of sound shifts and their geographical distribution.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes