CLIRMar 18, 2015

Text Segmentation based on Semantic Word Embeddings

arXiv:1503.05543v161 citations
Originality Incremental advance
AI Analysis

This addresses text segmentation for natural language processing applications, but it appears incremental as it builds on existing algorithms and benchmarks.

The paper tackles text segmentation by using semantic word embeddings in algorithms like C99 and new methods, achieving state-of-the-art performance for an untrained method with Content Vector Segmentation (CVS) on the Choi test set.

We explore the use of semantic word embeddings in text segmentation algorithms, including the C99 segmentation algorithm and new algorithms inspired by the distributed word vector representation. By developing a general framework for discussing a class of segmentation objectives, we study the effectiveness of greedy versus exact optimization approaches and suggest a new iterative refinement technique for improving the performance of greedy strategies. We compare our results to known benchmarks, using known metrics. We demonstrate state-of-the-art performance for an untrained method with our Content Vector Segmentation (CVS) on the Choi test set. Finally, we apply the segmentation procedure to an in-the-wild dataset consisting of text extracted from scholarly articles in the arXiv.org database.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes