CLMay 17, 2024

Multilingual Substitution-based Word Sense Induction

arXiv:2405.11086v181 citationsh-index: 4LREC
Originality Incremental advance
AI Analysis

This work addresses the need for unsupervised WSI methods in lower-resourced languages that lack lexical resources, though it is incremental as it builds on existing multilingual language models.

The authors tackled the challenge of adapting Word Sense Induction (WSI) to multiple languages by developing substitution-based methods that support 100 languages with minimal adaptation, achieving performance comparable to existing monolingual approaches on English datasets.

Word Sense Induction (WSI) is the task of discovering senses of an ambiguous word by grouping usages of this word into clusters corresponding to these senses. Many approaches were proposed to solve WSI in English and a few other languages, but these approaches are not easily adaptable to new languages. We present multilingual substitution-based WSI methods that support any of 100 languages covered by the underlying multilingual language model with minimal to no adaptation required. Despite the multilingual capabilities, our methods perform on par with the existing monolingual approaches on popular English WSI datasets. At the same time, they will be most useful for lower-resourced languages which miss lexical resources available for English, thus, have higher demand for unsupervised methods like WSI.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes