CLPEJul 2, 2025

Beyond cognacy

arXiv:2507.03005v11 citationsProceedings of the 7th Workshop on Research in Computational Linguistic Typology and Multilingual NLP
Originality Incremental advance
AI Analysis

This addresses the scalability problem for linguists by offering a promising alternative to labor-intensive expert annotations, enabling global-scale language phylogenies.

The paper tackled the bottleneck of expert-annotated cognate sets in computational phylogenetics for historical linguistics by comparing automated methods, finding that multiple sequence alignment-based inference produced trees more consistent with expert classifications and better predicted typological variation.

Computational phylogenetics has become an established tool in historical linguistics, with many language families now analyzed using likelihood-based inference. However, standard approaches rely on expert-annotated cognate sets, which are sparse, labor-intensive to produce, and limited to individual language families. This paper explores alternatives by comparing the established method to two fully automated methods that extract phylogenetic signal directly from lexical data. One uses automatic cognate clustering with unigram/concept features; the other applies multiple sequence alignment (MSA) derived from a pair-hidden Markov model. Both are evaluated against expert classifications from Glottolog and typological data from Grambank. Also, the intrinsic strengths of the phylogenetic signal in the characters are compared. Results show that MSA-based inference yields trees more consistent with linguistic classifications, better predicts typological variation, and provides a clearer phylogenetic signal, suggesting it as a promising, scalable alternative to traditional cognate-based methods. This opens new avenues for global-scale language phylogenies beyond expert annotation bottlenecks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes