CLOct 15, 2025

Quantifying Phonosemantic Iconicity Distributionally in 6 Languages

arXiv:2510.14040v21 citationsh-index: 3IJCNLP-AACL
Originality Incremental advance
AI Analysis

This work addresses the challenge of understanding phonosemantic iconicity at scale for linguistics and cognitive science, though it is incremental as it builds on existing theories with new quantitative methods.

The paper tackled the problem of quantifying systematic relationships between phonetics and semantics across languages by analyzing morphemes' phonetic and semantic similarity spaces in 6 diverse languages, discovering new interpretable phonosemantic alignments and crosslinguistic patterns while testing 5 previously hypothesized alignments with mixed results.

Language is, as commonly theorized, largely arbitrary. Yet, systematic relationships between phonetics and semantics have been observed in many specific cases. To what degree could those systematic relationships manifest themselves in large scale, quantitative investigations--both in previously identified and unidentified phenomena? This work undertakes a distributional approach to quantifying phonosemantic iconicity at scale across 6 diverse languages (English, Spanish, Hindi, Finnish, Turkish, and Tamil). In each language, we analyze the alignment of morphemes' phonetic and semantic similarity spaces with a suite of statistical measures, and discover an array of interpretable phonosemantic alignments not previously identified in the literature, along with crosslinguistic patterns. We also analyze 5 previously hypothesized phonosemantic alignments, finding support for some such alignments and mixed results for others.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes