Topology of Word Embeddings: Singularities Reflect Polysemy
This work offers a novel topological perspective for distinguishing polysemous from monosemous words, which could benefit researchers working on word sense disambiguation and understanding the geometric properties of word embeddings.
This paper proposes that word embeddings for polysemous words exist on singular points of a pinched manifold, while monosemous words reside on regular points. They introduce a topological measure of polysemy using persistent homology that correlates with the number of meanings and achieve competitive results on the SemEval-2010 Word Sense Induction & Disambiguation task.
The manifold hypothesis suggests that word vectors live on a submanifold within their ambient vector space. We argue that we should, more accurately, expect them to live on a pinched manifold: a singular quotient of a manifold obtained by identifying some of its points. The identified, singular points correspond to polysemous words, i.e. words with multiple meanings. Our point of view suggests that monosemous and polysemous words can be distinguished based on the topology of their neighbourhoods. We present two kinds of empirical evidence to support this point of view: (1) We introduce a topological measure of polysemy based on persistent homology that correlates well with the actual number of meanings of a word. (2) We propose a simple, topologically motivated solution to the SemEval-2010 task on Word Sense Induction & Disambiguation that produces competitive results.