Attention-based Neural Text Segmentation
This addresses the problem of text segmentation for NLP applications like summarization and document indexing, offering an automated method to replace manual feature engineering and reduce memory and time requirements, though it is incremental as it builds on existing neural techniques.
The paper tackles text segmentation in NLP by proposing a novel supervised neural approach using an attention-based bidirectional LSTM with CNN sentence embeddings, achieving a performance improvement of ~7% in WinDiff score on three benchmark datasets.
Text segmentation plays an important role in various Natural Language Processing (NLP) tasks like summarization, context understanding, document indexing and document noise removal. Previous methods for this task require manual feature engineering, huge memory requirements and large execution times. To the best of our knowledge, this paper is the first one to present a novel supervised neural approach for text segmentation. Specifically, we propose an attention-based bidirectional LSTM model where sentence embeddings are learned using CNNs and the segments are predicted based on contextual information. This model can automatically handle variable sized context information. Compared to the existing competitive baselines, the proposed model shows a performance improvement of ~7% in WinDiff score on three benchmark datasets.