CLJul 5, 2017

Context Aware Document Embedding

arXiv:1707.01521v11.02 citations

Originality Incremental advance

AI Analysis

This work addresses document representation for NLP tasks, offering an incremental improvement in efficiency and resource usage over existing methods.

The paper tackles the problem of document embedding by proposing a context-aware variant of doc2vec that uses a novel weight estimating mechanism with deep neural networks to assign weights based on word contributions in context. The result is a model that achieves similar performance to doc2vec initialized with Wikipedia-trained vectors while being more efficient and not requiring heavy external corpora.

Recently, doc2vec has achieved excellent results in different tasks. In this paper, we present a context aware variant of doc2vec. We introduce a novel weight estimating mechanism that generates weights for each word occurrence according to its contribution in the context, using deep neural networks. Our context aware model can achieve similar results compared to doc2vec initialized byWikipedia trained vectors, while being much more efficient and free from heavy external corpus. Analysis of context aware weights shows they are a kind of enhanced IDF weights that capture sub-topic level keywords in documents. They might result from deep neural networks that learn hidden representations with the least entropy.

View on arXiv PDF

Similar