IR CLJun 3, 2019

Contextually Propagated Term Weights for Document Representation

Casper Hansen, Christian Hansen, Stephen Alstrup, Jakob Grue Simonsen, Christina Lioma

arXiv:1906.00674v16.63 citationsh-index: 24Has Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of enhancing bag-of-words document representations for tasks like text classification, offering a novel approach to incorporate contextual semantics, though it appears incremental as it builds on existing word embedding methods.

The paper tackled the problem of improving document representation by redistributing term weights based on contextual similarity, using word embeddings to share semantic meaning across words in similar contexts. The result was that their model achieved the best micro and macro F1 scores in unsupervised evaluations against 8 state-of-the-art baselines across datasets of varying difficulty.

Word embeddings predict a word from its neighbours by learning small, dense embedding vectors. In practice, this prediction corresponds to a semantic score given to the predicted word (or term weight). We present a novel model that, given a target word, redistributes part of that word's weight (that has been computed with word embeddings) across words occurring in similar contexts as the target word. Thus, our model aims to simulate how semantic meaning is shared by words occurring in similar contexts, which is incorporated into bag-of-words document representations. Experimental evaluation in an unsupervised setting against 8 state of the art baselines shows that our model yields the best micro and macro F1 scores across datasets of increasing difficulty.

View on arXiv PDF Code

Similar