CLAug 18, 2013

Consensus Sequence Segmentation

Tamal Chowdhury, Rabindra Rakshit, Arko Banerjee

arXiv:1308.3839v2

Originality Incremental advance

AI Analysis

This addresses the challenge of automatic text segmentation for natural language processing applications, representing an incremental improvement over existing methods.

The paper tackles the problem of unsupervised word or phrase segmentation from sequences without a known lexicon, achieving superior segmentation results on multiple benchmarks.

In this paper we introduce a method to detect words or phrases in a given sequence of alphabets without knowing the lexicon. Our linear time unsupervised algorithm relies entirely on statistical relationships among alphabets in the input sequence to detect location of word boundaries. We compare our algorithm to previous approaches from unsupervised sequence segmentation literature and provide superior segmentation over number of benchmarks.

View on arXiv PDF

Similar