AutoExtend: Extending Word Embeddings to Embeddings for Synsets and Lexemes
This addresses the need for more precise semantic representations in natural language processing, particularly for tasks requiring disambiguation, though it is incremental as it builds on existing word embedding methods.
The paper tackles the problem of learning embeddings for synsets and lexemes by introducing AutoExtend, a system that extends existing word embeddings without needing additional training data, achieving state-of-the-art performance on word similarity and word sense disambiguation tasks.
We present \textit{AutoExtend}, a system to learn embeddings for synsets and lexemes. It is flexible in that it can take any word embeddings as input and does not need an additional training corpus. The synset/lexeme embeddings obtained live in the same vector space as the word embeddings. A sparse tensor formalization guarantees efficiency and parallelizability. We use WordNet as a lexical resource, but AutoExtend can be easily applied to other resources like Freebase. AutoExtend achieves state-of-the-art performance on word similarity and word sense disambiguation tasks.