Pavel Sofroniev

0.7CLFeb 16, 2017

Fast and unsupervised methods for multilingual cognate clustering

Taraka Rama, Johannes Wahle, Pavel Sofroniev et al.

In this paper we explore the use of unsupervised methods for detecting cognates in multilingual word lists. We use online EM to train sound segment similarity weights for computing similarity between two words. We tested our online systems on geographically spread sixteen different language groups of the world and show that the Online PMI system (Pointwise Mutual Information) outperforms a HMM based system and two linguistically motivated systems: LexStat and ALINE. Our results suggest that a PMI system trained in an online fashion can be used by historical linguists for fast and accurate identification of cognates in not so well-studied language families.

Pavel Sofroniev

1 Paper