Don't Neglect the Obvious: On the Role of Unambiguous Words in Word Sense Disambiguation
This work addresses a specific bottleneck in WSD for NLP researchers by incrementally improving dataset coverage for unambiguous words.
The paper tackles the problem of limited coverage in Word Sense Disambiguation (WSD) by addressing the poor annotation of unambiguous words in existing corpora, proposing a method to annotate these words and showing that it improves a state-of-the-art model's coverage and quality, leading to better WSD results.
State-of-the-art methods for Word Sense Disambiguation (WSD) combine two different features: the power of pre-trained language models and a propagation method to extend the coverage of such models. This propagation is needed as current sense-annotated corpora lack coverage of many instances in the underlying sense inventory (usually WordNet). At the same time, unambiguous words make for a large portion of all words in WordNet, while being poorly covered in existing sense-annotated corpora. In this paper, we propose a simple method to provide annotations for most unambiguous words in a large corpus. We introduce the UWA (Unambiguous Word Annotations) dataset and show how a state-of-the-art propagation-based model can use it to extend the coverage and quality of its word sense embeddings by a significant margin, improving on its original results on WSD.