CL CVDec 13, 2024

A Grounded Typology of Word Classes

Coleman Haley, Sharon Goldwater, Edoardo Ponti

arXiv:2412.10369v19.112 citationsh-index: 32

Originality Incremental advance

AI Analysis

This provides a quantitative, language-agnostic method for typological studies, offering insights into semantic function across languages, though it is incremental in applying existing multimodal models to a new linguistic task.

The authors tackled the problem of quantifying semantic contentfulness across languages by introducing a groundedness measure based on perceptual data and information theory, finding universal trends in word class groundedness and partly correlating with psycholinguistic norms.

We propose a grounded approach to meaning in language typology. We treat data from perceptual modalities, such as images, as a language-agnostic representation of meaning. Hence, we can quantify the function--form relationship between images and captions across languages. Inspired by information theory, we define "groundedness", an empirical measure of contextual semantic contentfulness (formulated as a difference in surprisal) which can be computed with multilingual multimodal language models. As a proof of concept, we apply this measure to the typology of word classes. Our measure captures the contentfulness asymmetry between functional (grammatical) and lexical (content) classes across languages, but contradicts the view that functional classes do not convey content. Moreover, we find universal trends in the hierarchy of groundedness (e.g., nouns > adjectives > verbs), and show that our measure partly correlates with psycholinguistic concreteness norms in English. We release a dataset of groundedness scores for 30 languages. Our results suggest that the grounded typology approach can provide quantitative evidence about semantic function in language.

View on arXiv PDF

Similar