Multilingual Word Sense Disambiguation with Unified Sense Representation
This work addresses the challenge of limited annotations for word sense disambiguation in non-English languages, which is important for improving NLP models in multilingual contexts, but it appears incremental as it builds on existing multilingual resources and transfer learning approaches.
The paper tackles the problem of multilingual word sense disambiguation (MWSD) by proposing a system that builds unified sense representations across languages using BabelNet, transferring annotations from resource-rich to resource-poor languages to address annotation scarcity. Evaluations on SemEval-13 and SemEval-15 datasets demonstrate its effectiveness, though no specific performance numbers are provided.
As a key natural language processing (NLP) task, word sense disambiguation (WSD) evaluates how well NLP models can understand the lexical semantics of words under specific contexts. Benefited from the large-scale annotation, current WSD systems have achieved impressive performances in English by combining supervised learning with lexical knowledge. However, such success is hard to be replicated in other languages, where we only have limited annotations.In this paper, based on the multilingual lexicon BabelNet describing the same set of concepts across languages, we propose building knowledge and supervised-based Multilingual Word Sense Disambiguation (MWSD) systems. We build unified sense representations for multiple languages and address the annotation scarcity problem for MWSD by transferring annotations from rich-sourced languages to poorer ones. With the unified sense representations, annotations from multiple languages can be jointly trained to benefit the MWSD tasks. Evaluations of SemEval-13 and SemEval-15 datasets demonstrate the effectiveness of our methodology.