A probabilistic assessment of the Indo-Aryan Inner-Outer Hypothesis
This provides a data-driven assessment for linguists studying Indo-Aryan language evolution, though it is incremental in applying existing NLP methods to a new linguistic problem.
The paper tackles the century-old Inner-Outer hypothesis in Indo-Aryan linguistics by applying a novel Bayesian hierarchical mixed-membership model to a large dataset of sound changes, finding evidence that dialect groups align with a core-periphery pattern when using a logistic normal prior.
This paper uses a novel data-driven probabilistic approach to address the century-old Inner-Outer hypothesis of Indo-Aryan. I develop a Bayesian hierarchical mixed-membership model to assess the validity of this hypothesis using a large data set of automatically extracted sound changes operating between Old Indo-Aryan and Modern Indo-Aryan speech varieties. I employ different prior distributions in order to model sound change, one of which, the logistic normal distribution, has not received much attention in linguistics outside of Natural Language Processing, despite its many attractive features. I find evidence for cohesive dialect groups that have made their imprint on contemporary Indo-Aryan languages, and find that when a logistic normal prior is used, the distribution of dialect components across languages is largely compatible with a core-periphery pattern similar to that proposed under the Inner-Outer hypothesis.