CLMar 1, 2025

Unstable Grounds for Beautiful Trees? Testing the Robustness of Concept Translations in the Compilation of Multilingual Wordlists

David Snee, Luca Ciucci, Arne Rubehn, Kellen Parker van Dam, Johann-Mattis List

arXiv:2503.00464v14.91 citationsh-index: 2Proceedings of the 7th Workshop on Research in Computational Linguistic Typology and Multilingual NLP

Originality Synthesis-oriented

AI Analysis

This addresses uncertainty in phylogenetic studies for comparative linguistics, but it is incremental as it focuses on data quality rather than new methods.

The study tested the robustness of concept translations in multilingual wordlists, finding that only 83% of translations yield the same word form and 23% have identical phonetic transcriptions across 10 dataset pairs from 9 language families.

Multilingual wordlists play a crucial role in comparative linguistics. While many studies have been carried out to test the power of computational methods for language subgrouping or divergence time estimation, few studies have put the data upon which these studies are based to a rigorous test. Here, we conduct a first experiment that tests the robustness of concept translation as an integral part of the compilation of multilingual wordlists. Investigating the variation in concept translations in independently compiled wordlists from 10 dataset pairs covering 9 different language families, we find that on average, only 83% of all translations yield the same word form, while identical forms in terms of phonetic transcriptions can only be found in 23% of all cases. Our findings can prove important when trying to assess the uncertainty of phylogenetic studies and the conclusions derived from them.

View on arXiv PDF

Similar