CLJul 30, 2014

Two-pass Discourse Segmentation with Pairing and Global Features

arXiv:1407.8215v119 citations

Originality Incremental advance

AI Analysis

This work addresses discourse segmentation for natural language processing, offering a significant but incremental improvement over existing methods.

The paper tackled the problem of RST-style discourse segmentation by developing a segmenter using pairing features centered on adjacent token pairs and global features encoding segmentation characteristics, achieving an F1 score of 92.6% for identifying in-sentence discourse boundaries, which is a 17.8% error-rate reduction over the state-of-the-art and approaches 95% of human performance.

Previous attempts at RST-style discourse segmentation typically adopt features centered on a single token to predict whether to insert a boundary before that token. In contrast, we develop a discourse segmenter utilizing a set of pairing features, which are centered on a pair of adjacent tokens in the sentence, by equally taking into account the information from both tokens. Moreover, we propose a novel set of global features, which encode characteristics of the segmentation as a whole, once we have an initial segmentation. We show that both the pairing and global features are useful on their own, and their combination achieved an $F_1$ of 92.6% of identifying in-sentence discourse boundaries, which is a 17.8% error-rate reduction over the state-of-the-art performance, approaching 95% of human performance. In addition, similar improvement is observed across different classification frameworks.

View on arXiv PDF

Similar