Bilingual Rhetorical Structure Parsing with Large Parallel Annotations
This addresses the problem of limited parallel data for discourse parsing researchers, though it is incremental as it builds on existing RST frameworks.
The paper tackled the challenge of cross-lingual discourse parsing by introducing a parallel Russian annotation for the English GUM RST corpus, achieving state-of-the-art results on both English and Russian corpora with effective transfer in bilingual settings.
Discourse parsing is a crucial task in natural language processing that aims to reveal the higher-level relations in a text. Despite growing interest in cross-lingual discourse parsing, challenges persist due to limited parallel data and inconsistencies in the Rhetorical Structure Theory (RST) application across languages and corpora. To address this, we introduce a parallel Russian annotation for the large and diverse English GUM RST corpus. Leveraging recent advances, our end-to-end RST parser achieves state-of-the-art results on both English and Russian corpora. It demonstrates effectiveness in both monolingual and bilingual settings, successfully transferring even with limited second-language annotation. To the best of our knowledge, this work is the first to evaluate the potential of cross-lingual end-to-end RST parsing on a manually annotated parallel corpus.