CLAIDec 12, 2021

Predicting Above-Sentence Discourse Structure using Distant Supervision from Topic Segmentation

arXiv:2112.06196v18 citations
Originality Incremental advance
AI Analysis

This work addresses the lack of large-scale datasets for discourse parsing, which is crucial for NLP tasks, by proposing an incremental improvement using topic segmentation as a complementary signal.

The paper tackled the data sparsity problem in RST-style discourse parsing by using distant supervision from topic segmentation, resulting in accurate tree structures that outperformed previous models on sentence-to-document tasks and sometimes achieved higher scores on sentence-to-paragraph levels.

RST-style discourse parsing plays a vital role in many NLP tasks, revealing the underlying semantic/pragmatic structure of potentially complex and diverse documents. Despite its importance, one of the most prevailing limitations in modern day discourse parsing is the lack of large-scale datasets. To overcome the data sparsity issue, distantly supervised approaches from tasks like sentiment analysis and summarization have been recently proposed. Here, we extend this line of research by exploiting distant supervision from topic segmentation, which can arguably provide a strong and oftentimes complementary signal for high-level discourse structures. Experiments on two human-annotated discourse treebanks confirm that our proposal generates accurate tree structures on sentence and paragraph level, consistently outperforming previous distantly supervised models on the sentence-to-document task and occasionally reaching even higher scores on the sentence-to-paragraph level.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes