CLOct 3, 2019

Modeling Color Terminology Across Thousands of Languages

Arya D. McCarthy, Winston Wu, Aaron Mueller, Bill Watson, David Yarowsky

arXiv:1910.01531v130.1998 citationsHas Code

Originality Incremental advance

AI Analysis

This provides empirical computational support for linguistic theories of color terminology, but it is incremental as it builds on established hypotheses.

The paper tackles the problem of validating and refining the Berlin and Kay hypotheses on basic color terms by applying computational linguistic metrics to cross-linguistic data, finding strong correlations (gamma=0.96) and suggesting a spectrum-based approach instead of a dichotomy.

There is an extensive history of scholarship into what constitutes a "basic" color term, as well as a broadly attested acquisition sequence of basic color terms across many languages, as articulated in the seminal work of Berlin and Kay (1969). This paper employs a set of diverse measures on massively cross-linguistic data to operationalize and critique the Berlin and Kay color term hypotheses. Collectively, the 14 empirically-grounded computational linguistic metrics we design---as well as their aggregation---correlate strongly with both the Berlin and Kay basic/secondary color term partition (gamma=0.96) and their hypothesized universal acquisition sequence. The measures and result provide further empirical evidence from computational linguistics in support of their claims, as well as additional nuance: they suggest treating the partition as a spectrum instead of a dichotomy.

View on arXiv PDF Code

Similar