Gaussian Word Embedding with a Wasserstein Distance Loss
This work addresses word representation for natural language processing tasks, offering a more flexible approach with semi-supervised external information, though it appears incremental as it builds on existing distribution-based embedding methods.
The authors tackled the problem of word representation by proposing Gaussian word embeddings with a Wasserstein distance loss, which showed improved performance over point-based embeddings by incorporating uncertainty and semantic richness. They evaluated their method on 13 word similarity datasets, one word entailment dataset, and six document classification tasks, demonstrating efficiency gains.
Compared with word embedding based on point representation, distribution-based word embedding shows more flexibility in expressing uncertainty and therefore embeds richer semantic information when representing words. The Wasserstein distance provides a natural notion of dissimilarity with probability measures and has a closed-form solution when measuring the distance between two Gaussian distributions. Therefore, with the aim of representing words in a highly efficient way, we propose to operate a Gaussian word embedding model with a loss function based on the Wasserstein distance. Also, external information from ConceptNet will be used to semi-supervise the results of the Gaussian word embedding. Thirteen datasets from the word similarity task, together with one from the word entailment task, and six datasets from the downstream document classification task will be evaluated in this paper to test our hypothesis.