Number Theory Meets Linguistics: Modelling Noun Pluralisation Across 1497 Languages Using 2-adic Metrics
This addresses noun pluralisation modelling for linguists and computational linguists, but is incremental as it applies a known mathematical concept to a specific linguistic task.
The paper tackled noun pluralisation modelling across 1497 languages using a linear regression with p-adic metrics, finding it substantially outperforms Euclidean-space regressors in multiple language families, but found insufficient evidence for modelling distinct noun declensions as p-adic neighbourhoods in Indo-European languages.
A simple machine learning model of pluralisation as a linear regression problem minimising a p-adic metric substantially outperforms even the most robust of Euclidean-space regressors on languages in the Indo-European, Austronesian, Trans New-Guinea, Sino-Tibetan, Nilo-Saharan, Oto-Meanguean and Atlantic-Congo language families. There is insufficient evidence to support modelling distinct noun declensions as a p-adic neighbourhood even in Indo-European languages.