CLSep 23, 2024

Bilingual Rhetorical Structure Parsing with Large Parallel Annotations

arXiv:2409.14969v128 citationsh-index: 5
Originality Incremental advance
AI Analysis

This addresses the problem of limited parallel data for discourse parsing researchers, though it is incremental as it builds on existing RST frameworks.

The paper tackled the challenge of cross-lingual discourse parsing by introducing a parallel Russian annotation for the English GUM RST corpus, achieving state-of-the-art results on both English and Russian corpora with effective transfer in bilingual settings.

Discourse parsing is a crucial task in natural language processing that aims to reveal the higher-level relations in a text. Despite growing interest in cross-lingual discourse parsing, challenges persist due to limited parallel data and inconsistencies in the Rhetorical Structure Theory (RST) application across languages and corpora. To address this, we introduce a parallel Russian annotation for the large and diverse English GUM RST corpus. Leveraging recent advances, our end-to-end RST parser achieves state-of-the-art results on both English and Russian corpora. It demonstrates effectiveness in both monolingual and bilingual settings, successfully transferring even with limited second-language annotation. To the best of our knowledge, this work is the first to evaluate the potential of cross-lingual end-to-end RST parsing on a manually annotated parallel corpus.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes