CLMay 27, 2021

RAW-C: Relatedness of Ambiguous Words--in Context (A New Lexical Resource for English)

arXiv:2105.13266v125 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of assessing lexical semantics in NLP for researchers and practitioners, providing a new benchmark dataset, but it is incremental as it builds on existing embedding methods without introducing a new paradigm.

The authors tackled the challenge of evaluating how well contextualized word embeddings capture the continuous nature of word meaning by introducing RAW-C, a dataset of graded human relatedness judgments for ambiguous words in context, with an average inter-annotator agreement of 0.79, and found that cosine distance from BERT and ELMo correlates with human judgments but systematically underestimates same-sense similarity and overestimates different-sense similarity.

Most words are ambiguous--i.e., they convey distinct meanings in different contexts--and even the meanings of unambiguous words are context-dependent. Both phenomena present a challenge for NLP. Recently, the advent of contextualized word embeddings has led to success on tasks involving lexical ambiguity, such as Word Sense Disambiguation. However, there are few tasks that directly evaluate how well these contextualized embeddings accommodate the more continuous, dynamic nature of word meaning--particularly in a way that matches human intuitions. We introduce RAW-C, a dataset of graded, human relatedness judgments for 112 ambiguous words in context (with 672 sentence pairs total), as well as human estimates of sense dominance. The average inter-annotator agreement (assessed using a leave-one-annotator-out method) was 0.79. We then show that a measure of cosine distance, computed using contextualized embeddings from BERT and ELMo, correlates with human judgments, but that cosine distance also systematically underestimates how similar humans find uses of the same sense of a word to be, and systematically overestimates how similar humans find uses of different-sense homonyms. Finally, we propose a synthesis between psycholinguistic theories of the mental lexicon and computational models of lexical semantics.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes