PSDVec: a Toolbox for Incremental and Scalable Word Embedding
This provides NLP practitioners with a scalable and incremental option for word embedding learning, though it is incremental as it builds on existing embedding methods.
The authors tackled the problem of learning word embeddings efficiently by introducing PSDVec, a toolbox that uses a weighted low-rank positive semidefinite approximation and a blockwise online learning algorithm, resulting in the best average performance on 9 benchmark sets and 2 NLP tasks among popular tools.
PSDVec is a Python/Perl toolbox that learns word embeddings, i.e. the mapping of words in a natural language to continuous vectors which encode the semantic/syntactic regularities between the words. PSDVec implements a word embedding learning method based on a weighted low-rank positive semidefinite approximation. To scale up the learning process, we implement a blockwise online learning algorithm to learn the embeddings incrementally. This strategy greatly reduces the learning time of word embeddings on a large vocabulary, and can learn the embeddings of new words without re-learning the whole vocabulary. On 9 word similarity/analogy benchmark sets and 2 Natural Language Processing (NLP) tasks, PSDVec produces embeddings that has the best average performance among popular word embedding tools. PSDVec provides a new option for NLP practitioners.