IRLGJun 27, 2012

Sequential Document Representations and Simplicial Curves

arXiv:1206.6858v116 citations
Originality Incremental advance
AI Analysis

This work addresses the limitation of bag-of-words models for text processing tasks, offering a novel representation that could improve sequential analysis in natural language processing.

The authors tackled the problem of representing documents while preserving sequential information, which bag-of-words models lack, by introducing a continuous and differentiable representation using smooth curves in the multinomial simplex, and demonstrated its effectiveness for text classification.

The popular bag of words assumption represents a document as a histogram of word occurrences. While computationally efficient, such a representation is unable to maintain any sequential information. We present a continuous and differentiable sequential document representation that goes beyond the bag of words assumption, and yet is efficient and effective. This representation employs smooth curves in the multinomial simplex to account for sequential information. We discuss the representation and its geometric properties and demonstrate its applicability for the task of text classification.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes