CLLGJan 7, 2021

Homonym Identification using BERT -- Using a Clustering Approach

arXiv:2101.02398v1
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of homonym identification for Word Sense Disambiguation (WSD) systems requiring coarse-grained sense partitions, representing an incremental step in improving WSD accuracy.

This paper investigates whether contextual information is sufficient for identifying homonymous words by using BERT embeddings. The authors applied various clustering algorithms to these embeddings and visualized them in a lower-dimensional space to assess the feasibility of the clustering process.

Homonym identification is important for WSD that require coarse-grained partitions of senses. The goal of this project is to determine whether contextual information is sufficient for identifying a homonymous word. To capture the context, BERT embeddings are used as opposed to Word2Vec, which conflates senses into one vector. SemCor is leveraged to retrieve the embeddings. Various clustering algorithms are applied to the embeddings. Finally, the embeddings are visualized in a lower-dimensional space to understand the feasibility of the clustering process.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes