Large Discourse Treebanks from Scalable Distant Supervision
This addresses a bottleneck for NLP researchers and practitioners by providing scalable data for discourse parsing, though it is incremental as it builds on existing distant supervision methods.
The paper tackled the problem of limited data for discourse parsing by proposing a framework to generate large-scale 'silver-standard' discourse treebanks using distant supervision from sentiment analysis, enabling training on more diverse and domain-independent datasets.
Discourse parsing is an essential upstream task in Natural Language Processing with strong implications for many real-world applications. Despite its widely recognized role, most recent discourse parsers (and consequently downstream tasks) still rely on small-scale human-annotated discourse treebanks, trying to infer general-purpose discourse structures from very limited data in a few narrow domains. To overcome this dire situation and allow discourse parsers to be trained on larger, more diverse and domain-independent datasets, we propose a framework to generate "silver-standard" discourse trees from distant supervision on the auxiliary task of sentiment analysis.