CLAug 7, 2019

Ab Antiquo: Neural Proto-language Reconstruction

arXiv:1908.02477v30.00729 citations
AI Analysis50

This addresses the challenge of automating historical linguistic reconstruction, though it appears incremental as it applies existing neural methods to this specific domain.

The paper tackles the problem of automating proto-word reconstruction from cognates in contemporary daughter languages, showing that neural sequence models outperform conventional methods on a novel dataset of over 8,000 comparative entries.

Historical linguists have identified regularities in the process of historic sound change. The comparative method utilizes those regularities to reconstruct proto-words based on observed forms in daughter languages. Can this process be efficiently automated? We address the task of proto-word reconstruction, in which the model is exposed to cognates in contemporary daughter languages, and has to predict the proto word in the ancestor language. We provide a novel dataset for this task, encompassing over 8,000 comparative entries, and show that neural sequence models outperform conventional methods applied to this task so far. Error analysis reveals variability in the ability of neural model to capture different phonological changes, correlating with the complexity of the changes. Analysis of learned embeddings reveals the models learn phonologically meaningful generalizations, corresponding to well-attested phonological shifts documented by historical linguistics.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes