Persistent Topology of Syntax
This work addresses the problem of understanding linguistic structures and historical relationships for linguists and computational researchers, but it is incremental as it applies existing topological methods to linguistic data.
The authors investigated the persistent homology of syntactic parameters across world languages, finding that non-trivial persistent homology emerges within specific language families, particularly Indo-European and Niger-Congo, with concrete results showing that a persistent first homology generator in Indo-European is linked to Ancient Greek rather than the Anglo-Norman bridge.
We study the persistent homology of the data set of syntactic parameters of the world languages. We show that, while homology generators behave erratically over the whole data set, non-trivial persistent homology appears when one restricts to specific language families. Different families exhibit different persistent homology. We focus on the cases of the Indo-European and the Niger-Congo families, for which we compare persistent homology over different cluster filtering values. We investigate the possible significance, in historical linguistic terms, of the presence of persistent generators of the first homology. In particular, we show that the persistent first homology generator we find in the Indo-European family is not due (as one might guess) to the Anglo-Norman bridge in the Indo-European phylogenetic network, but is related to the position of Ancient Greek and the Hellenic branch within the network.