CLAug 18, 2013

Consensus Sequence Segmentation

arXiv:1308.3839v2
Originality Incremental advance
AI Analysis

This addresses the challenge of automatic text segmentation for natural language processing applications, representing an incremental improvement over existing methods.

The paper tackles the problem of unsupervised word or phrase segmentation from sequences without a known lexicon, achieving superior segmentation results on multiple benchmarks.

In this paper we introduce a method to detect words or phrases in a given sequence of alphabets without knowing the lexicon. Our linear time unsupervised algorithm relies entirely on statistical relationships among alphabets in the input sequence to detect location of word boundaries. We compare our algorithm to previous approaches from unsupervised sequence segmentation literature and provide superior segmentation over number of benchmarks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes