Learning Meta-Embeddings by Using Ensembles of Embedding Sets
This work addresses the challenge of inconsistent embedding quality for natural language processing tasks, offering an incremental improvement through ensemble methods.
The paper tackled the problem of varying quality and characteristics in word embeddings by proposing an ensemble approach to combine different public embedding sets into meta-embeddings, resulting in better performance on word similarity, analogy tasks, and part-of-speech tagging compared to individual sets.
Word embeddings -- distributed representations of words -- in deep learning are beneficial for many tasks in natural language processing (NLP). However, different embedding sets vary greatly in quality and characteristics of the captured semantics. Instead of relying on a more advanced algorithm for embedding learning, this paper proposes an ensemble approach of combining different public embedding sets with the aim of learning meta-embeddings. Experiments on word similarity and analogy tasks and on part-of-speech tagging show better performance of meta-embeddings compared to individual embedding sets. One advantage of meta-embeddings is the increased vocabulary coverage. We will release our meta-embeddings publicly.