CLJul 10, 2016

Syntactic Phylogenetic Trees

Kevin Shu, Sharjeel Aziz, Vy-Luan Huynh, David Warrick, Matilde Marcolli

arXiv:1607.02791v13.610 citations

Originality Synthesis-oriented

AI Analysis

This addresses computational linguistic phylogenetics for researchers, but it is incremental as it focuses on improving existing data usage rather than introducing a new method.

The paper tackled the problem of unreliable phylogenetic trees from SSWL syntactic data by identifying sources of error and proposing corrections like prior subdivision into language families and better use of ancient language information, finding that after these adjustments, SSWL data matched reliable phylogenetic trees extremely well in simple examples.

In this paper we identify several serious problems that arise in the use of syntactic data from the SSWL database for the purpose of computational phylogenetic reconstruction. We show that the most naive approach fails to produce reliable linguistic phylogenetic trees. We identify some of the sources of the observed problems and we discuss how they may be, at least partly, corrected by using additional information, such as prior subdivision into language families and subfamilies, and a better use of the information about ancient languages. We also describe how the use of phylogenetic algebraic geometry can help in estimating to what extent the probability distribution at the leaves of the phylogenetic tree obtained from the SSWL data can be considered reliable, by testing it on phylogenetic trees established by other forms of linguistic analysis. In simple examples, we find that, after restricting to smaller language subfamilies and considering only those SSWL parameters that are fully mapped for the whole subfamily, the SSWL data match extremely well reliable phylogenetic trees, according to the evaluation of phylogenetic invariants. This is a promising sign for the use of SSWL data for linguistic phylogenetics.

View on arXiv PDF

Similar