CLMar 25, 2018

Text Segmentation as a Supervised Learning Task

arXiv:1803.09337v11119 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of text segmentation for natural language processing, but it is incremental as it builds on existing unsupervised methods by introducing a supervised approach.

The authors tackled the challenge of text segmentation by formulating it as a supervised learning problem, creating a large dataset from Wikipedia and developing a model that generalizes well to unseen text.

Text segmentation, the task of dividing a document into contiguous segments based on its semantic structure, is a longstanding challenge in language understanding. Previous work on text segmentation focused on unsupervised methods such as clustering or graph search, due to the paucity in labeled data. In this work, we formulate text segmentation as a supervised learning problem, and present a large new dataset for text segmentation that is automatically extracted and labeled from Wikipedia. Moreover, we develop a segmentation model based on this dataset and show that it generalizes well to unseen natural text.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes