Neural Generative Rhetorical Structure Parsing
This work addresses the challenge of parsing rhetorical structures for document-level tasks like summarization, offering a more sample-efficient approach for small datasets, though it is incremental as it builds on existing RNN grammar frameworks.
The authors tackled the problem of rhetorical structure parsing by introducing the first generative model, a document-level RNN grammar with a novel beam search algorithm, which improved unlabelled and labelled F1 by 6.8 and 2.9 points over previous methods and outperformed discriminative models by 2.6 F1 points.
Rhetorical structure trees have been shown to be useful for several document-level tasks including summarization and document classification. Previous approaches to RST parsing have used discriminative models; however, these are less sample efficient than generative models, and RST parsing datasets are typically small. In this paper, we present the first generative model for RST parsing. Our model is a document-level RNN grammar (RNNG) with a bottom-up traversal order. We show that, for our parser's traversal order, previous beam search algorithms for RNNGs have a left-branching bias which is ill-suited for RST parsing. We develop a novel beam search algorithm that keeps track of both structure- and word-generating actions without exhibiting this branching bias and results in absolute improvements of 6.8 and 2.9 on unlabelled and labelled F1 over previous algorithms. Overall, our generative model outperforms a discriminative model with the same features by 2.6 F1 points and achieves performance comparable to the state-of-the-art, outperforming all published parsers from a recent replication study that do not use additional training data.