CLMay 14, 2018

Discourse Coherence in the Wild: A Dataset, Evaluation and Methods

arXiv:1805.04993v132.11100 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of evaluating discourse coherence in real-world texts for NLP researchers, but it is incremental as it builds on existing methods with new data and benchmarks.

The paper tackled the lack of real-world evaluation for discourse coherence methods by introducing a new corpus (GCDC) and conducting a large-scale assessment, showing that neural models like SentAvg and ParSeq perform best, with specific performance differences analyzed across four domains.

To date there has been very little work on assessing discourse coherence methods on real-world data. To address this, we present a new corpus of real-world texts (GCDC) as well as the first large-scale evaluation of leading discourse coherence algorithms. We show that neural models, including two that we introduce here (SentAvg and ParSeq), tend to perform best. We analyze these performance differences and discuss patterns we observed in low coherence texts in four domains.

View on arXiv PDF Code

Similar