Markov Chain Monte-Carlo Phylogenetic Inference Construction in Computational Historical Linguistics
This work addresses the problem of scaling linguistic analysis for researchers in computational historical linguistics, though it appears incremental as it applies existing MCMC methods to language data.
The paper tackles the challenge of manual annotation in historical linguistics by using computational methods to cluster languages and construct phylogenetic trees via Markov Chain Monte Carlo (MCMC), aiming to reduce workload in identifying cognate words and language relationships.
More and more languages in the world are under study nowadays, as a result, the traditional way of historical linguistics study is facing some challenges. For example, the linguistic comparative research among languages needs manual annotation, which becomes more and more impossible with the increasing amount of language data coming out all around the world. Although it could hardly replace linguists work, the automatic computational methods have been taken into consideration and it can help people reduce their workload. One of the most important work in historical linguistics is word comparison from different languages and find the cognate words for them, which means people try to figure out if the two languages are related to each other or not. In this paper, I am going to use computational method to cluster the languages and use Markov Chain Monte Carlo (MCMC) method to build the language typology relationship tree based on the clusters.