CLOct 3, 2019

Modeling Color Terminology Across Thousands of Languages

arXiv:1910.01531v1998 citations
Originality Incremental advance
AI Analysis

This provides empirical computational support for linguistic theories of color terminology, but it is incremental as it builds on established hypotheses.

The paper tackles the problem of validating and refining the Berlin and Kay hypotheses on basic color terms by applying computational linguistic metrics to cross-linguistic data, finding strong correlations (gamma=0.96) and suggesting a spectrum-based approach instead of a dichotomy.

There is an extensive history of scholarship into what constitutes a "basic" color term, as well as a broadly attested acquisition sequence of basic color terms across many languages, as articulated in the seminal work of Berlin and Kay (1969). This paper employs a set of diverse measures on massively cross-linguistic data to operationalize and critique the Berlin and Kay color term hypotheses. Collectively, the 14 empirically-grounded computational linguistic metrics we design---as well as their aggregation---correlate strongly with both the Berlin and Kay basic/secondary color term partition (gamma=0.96) and their hypothesized universal acquisition sequence. The measures and result provide further empirical evidence from computational linguistics in support of their claims, as well as additional nuance: they suggest treating the partition as a spectrum instead of a dichotomy.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes