CLOct 7, 2025

Bridging Discourse Treebanks with a Unified Rhetorical Structure Parser

arXiv:2510.06427v11 citationsh-index: 5Proceedings of the 6th Workshop on Computational Approaches to Discourse, Context and Document-Level Inferences (CODI 2025)
Originality Highly original
AI Analysis

This work addresses the challenge of multilingual discourse parsing for NLP researchers by providing a single-model solution that improves performance across diverse resources.

The authors tackled the problem of incompatible rhetorical structure treebanks across languages by introducing UniRST, a unified parser that handles 18 treebanks in 11 languages without modifying relation inventories, and showed that it outperforms 16 of 18 mono-treebank baselines.

We introduce UniRST, the first unified RST-style discourse parser capable of handling 18 treebanks in 11 languages without modifying their relation inventories. To overcome inventory incompatibilities, we propose and evaluate two training strategies: Multi-Head, which assigns separate relation classification layer per inventory, and Masked-Union, which enables shared parameter training through selective label masking. We first benchmark monotreebank parsing with a simple yet effective augmentation technique for low-resource settings. We then train a unified model and show that (1) the parameter efficient Masked-Union approach is also the strongest, and (2) UniRST outperforms 16 of 18 mono-treebank baselines, demonstrating the advantages of a single-model, multilingual end-to-end discourse parsing across diverse resources.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes