CLJun 13, 2019

Meaning to Form: Measuring Systematicity as Information

Tiago Pimentel, Arya D. McCarthy, Damián E. Blasi, Brian Roark, Ryan Cotterell

arXiv:1906.05906v231.11097 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This work addresses a longstanding debate in semiotics about the arbitrary vs. systematic nature of language for linguists and AI researchers, but it is incremental as it applies existing methods to new data.

The authors tackled the problem of quantifying systematicity between linguistic signs and their semantics using mutual information and recurrent neural networks across 106 languages, finding a statistically significant reduction in entropy when modeling word forms conditioned on semantic representations, though the effect size was small.

A longstanding debate in semiotics centers on the relationship between linguistic signs and their corresponding semantics: is there an arbitrary relationship between a word form and its meaning, or does some systematic phenomenon pervade? For instance, does the character bigram \textit{gl} have any systematic relationship to the meaning of words like \textit{glisten}, \textit{gleam} and \textit{glow}? In this work, we offer a holistic quantification of the systematicity of the sign using mutual information and recurrent neural networks. We employ these in a data-driven and massively multilingual approach to the question, examining 106 languages. We find a statistically significant reduction in entropy when modeling a word form conditioned on its semantic representation. Encouragingly, we also recover well-attested English examples of systematic affixes. We conclude with the meta-point: Our approximate effect size (measured in bits) is quite small---despite some amount of systematicity between form and meaning, an arbitrary relationship and its resulting benefits dominate human language.

View on arXiv PDF Code

Similar