CLJul 20, 2017

Improving Discourse Relation Projection to Build Discourse Annotated Corpora

arXiv:1707.06357v11086 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of building discourse-annotated corpora for low-resource languages like French, though it is incremental as it builds on existing projection methods.

The paper tackled the problem of projecting discourse annotations across languages by identifying unsupported annotations using statistical word-alignment models, achieving 65% identification in English-French data and inducing the first PDTB-style French corpus, which improved classifier F1-score by 15%.

The naive approach to annotation projection is not effective to project discourse annotations from one language to another because implicit discourse relations are often changed to explicit ones and vice-versa in the translation. In this paper, we propose a novel approach based on the intersection between statistical word-alignment models to identify unsupported discourse annotations. This approach identified 65% of the unsupported annotations in the English-French parallel sentences from Europarl. By filtering out these unsupported annotations, we induced the first PDTB-style discourse annotated corpus for French from Europarl. We then used this corpus to train a classifier to identify the discourse-usage of French discourse connectives and show a 15% improvement of F1-score compared to the classifier trained on the non-filtered annotations.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes