CLJan 11, 2017

Cross-lingual RST Discourse Parsing

arXiv:1701.02946v184 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of discourse parsing for multiple languages, enabling better understanding of information flow in non-English documents, though it is incremental in extending existing methods to new data.

The paper tackled cross-lingual discourse parsing by introducing a new parser that is simpler and significantly better on 2 out of 3 metrics compared to state-of-the-art for English, and conducted the first experiments on cross-lingual parsing by harmonizing treebanks across multiple languages.

Discourse parsing is an integral part of understanding information flow and argumentative structure in documents. Most previous research has focused on inducing and evaluating models from the English RST Discourse Treebank. However, discourse treebanks for other languages exist, including Spanish, German, Basque, Dutch and Brazilian Portuguese. The treebanks share the same underlying linguistic theory, but differ slightly in the way documents are annotated. In this paper, we present (a) a new discourse parser which is simpler, yet competitive (significantly better on 2/3 metrics) to state of the art for English, (b) a harmonization of discourse treebanks across languages, enabling us to present (c) what to the best of our knowledge are the first experiments on cross-lingual discourse parsing.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes