CLNov 27, 2015

Category Enhanced Word Embedding

Chunting Zhou, Chonglin Sun, Zhiyuan Liu, Francis C. M. Lau

arXiv:1511.08629v21.1

Originality Incremental advance

AI Analysis

This work addresses the need for better word representations in natural language processing, though it is incremental as it builds on existing embedding techniques.

The paper tackled the problem of improving word embeddings by incorporating document category information, resulting in models that outperform state-of-the-art methods in word analogy and similarity tasks and show superiority in sentiment analysis and text classification.

Distributed word representations have been demonstrated to be effective in capturing semantic and syntactic regularities. Unsupervised representation learning from large unlabeled corpora can learn similar representations for those words that present similar co-occurrence statistics. Besides local occurrence statistics, global topical information is also important knowledge that may help discriminate a word from another. In this paper, we incorporate category information of documents in the learning of word representations and to learn the proposed models in a document-wise manner. Our models outperform several state-of-the-art models in word analogy and word similarity tasks. Moreover, we evaluate the learned word vectors on sentiment analysis and text classification tasks, which shows the superiority of our learned word vectors. We also learn high-quality category embeddings that reflect topical meanings.

View on arXiv PDF

Similar