CLJul 30, 2014

Two-pass Discourse Segmentation with Pairing and Global Features

arXiv:1407.8215v119 citations
Originality Incremental advance
AI Analysis

This work addresses discourse segmentation for natural language processing, offering a significant but incremental improvement over existing methods.

The paper tackled the problem of RST-style discourse segmentation by developing a segmenter using pairing features centered on adjacent token pairs and global features encoding segmentation characteristics, achieving an F1 score of 92.6% for identifying in-sentence discourse boundaries, which is a 17.8% error-rate reduction over the state-of-the-art and approaches 95% of human performance.

Previous attempts at RST-style discourse segmentation typically adopt features centered on a single token to predict whether to insert a boundary before that token. In contrast, we develop a discourse segmenter utilizing a set of pairing features, which are centered on a pair of adjacent tokens in the sentence, by equally taking into account the information from both tokens. Moreover, we propose a novel set of global features, which encode characteristics of the segmentation as a whole, once we have an initial segmentation. We show that both the pairing and global features are useful on their own, and their combination achieved an $F_1$ of 92.6% of identifying in-sentence discourse boundaries, which is a 17.8% error-rate reduction over the state-of-the-art performance, approaching 95% of human performance. In addition, similar improvement is observed across different classification frameworks.

Code Implementations4 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes