CLDec 3, 2020

Multilingual Neural RST Discourse Parsing

arXiv:2012.01704v1992 citations
AI Analysis

This work provides a solution for improving discourse parsing in low-resource languages, which is important for researchers and applications dealing with multilingual text analysis.

The authors tackle the problem of multilingual RST discourse parsing, which is challenging due to the shortage of annotated data in languages other than English. They achieve state-of-the-art performance on cross-lingual, document-level discourse parsing across all sub-tasks by utilizing multilingual vector representations and adopting segment-level translation.

Text discourse parsing plays an important role in understanding information flow and argumentative structure in natural language. Previous research under the Rhetorical Structure Theory (RST) has mostly focused on inducing and evaluating models from the English treebank. However, the parsing tasks for other languages such as German, Dutch, and Portuguese are still challenging due to the shortage of annotated data. In this work, we investigate two approaches to establish a neural, cross-lingual discourse parser via: (1) utilizing multilingual vector representations; and (2) adopting segment-level translation of the source content. Experiment results show that both methods are effective even with limited training data, and achieve state-of-the-art performance on cross-lingual, document-level discourse parsing on all sub-tasks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes