CLMar 22, 2018

Contextual Salience for Fast and Accurate Sentence Vectors

arXiv:1803.08493v62 citations
Originality Highly original
AI Analysis

This addresses the need for efficient and interpretable sentence representations in language tasks like sentiment classification, offering a novel approach to a known bottleneck.

The paper tackled the problem of creating fast, accurate, and interpretable unsupervised sentence vectors by introducing contextual salience (CoSal), a measure that normalizes word importance based on context vector distributions. The result was a method that outperformed SkipThought on most benchmarks, beat tf-idf on all benchmarks, and was competitive with the unsupervised state-of-the-art while requiring minimal computation.

Unsupervised vector representations of sentences or documents are a major building block for many language tasks such as sentiment classification. However, current methods are uninterpretable and slow or require large training datasets. Recent word vector-based proposals implicitly assume that distances in a word embedding space are equally important, regardless of context. We introduce contextual salience (CoSal), a measure of word importance that uses the distribution of context vectors to normalize distances and weights. CoSal relies on the insight that unusual word vectors disproportionately affect phrase vectors. A bag-of-words model with CoSal-based weights produces accurate unsupervised sentence or document representations for classification, requiring little computation to evaluate and only a single covariance calculation to ``train." CoSal supports small contexts, out-of context words and outperforms SkipThought on most benchmarks, beats tf-idf on all benchmarks, and is competitive with the unsupervised state-of-the-art.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes