CLJun 20, 2024

Evaluating Contextualized Representations of (Spanish) Ambiguous Words: A New Lexical Resource and Empirical Analysis

Pamela D. Rivière, Anne L. Beatty-Martínez, Sean Trott

arXiv:2406.14678v310.013 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the lack of systematic evaluation for contextualized word embeddings in non-English languages, specifically Spanish, which is incremental as it extends existing methods to a new linguistic context.

The researchers tackled the problem of evaluating how well language models represent ambiguous words in Spanish by creating a new dataset of sentence pairs and comparing model embeddings to human judgments. They found that BERT-based models capture some variance but fall short of human benchmarks, with performance scaling with model size.

Lexical ambiguity -- where a single wordform takes on distinct, context-dependent meanings -- serves as a useful tool to compare across different language models' (LMs') ability to form distinct, contextualized representations of the same stimulus. Few studies have systematically compared LMs' contextualized word embeddings for languages beyond English. Here, we evaluate semantic representations of Spanish ambiguous nouns in context in a suite of Spanish-language monolingual and multilingual BERT-based models. We develop a novel dataset of minimal-pair sentences evoking the same or different sense for a target ambiguous noun. In a pre-registered study, we collect contextualized human relatedness judgments for each sentence pair. We find that various BERT-based LMs' contextualized semantic representations capture some variance in human judgments but fall short of the human benchmark. In exploratory work, we find that performance scales with model size. We also identify stereotyped trajectories of target noun disambiguation as a proportion of traversal through a given LM family's architecture, which we partially replicate in English. We contribute (1) a dataset of controlled, Spanish sentence stimuli with human relatedness norms, and (2) to our evolving understanding of the impact that LM specification (architectures, training protocols) exerts on contextualized embeddings.

View on arXiv PDF

Similar