CLFeb 15, 2024

Unlocking Structure Measuring: Introducing PDD, an Automatic Metric for Positional Discourse Coherence

Yinhong Liu, Yixuan Su, Ehsan Shareghi, Nigel Collier

Cambridge

arXiv:2402.10175v216.433 citationsh-index: 19Has CodeNAACL

Originality Incremental advance

AI Analysis

This addresses the need for better automatic evaluation of discourse coherence in LLM-generated long-form text, which is incremental as it builds on existing discourse coherence research.

The paper tackles the problem of evaluating discourse coherence in long-form text generation by introducing PDD, an automatic metric that quantifies discourse divergence between articles. The results show that PDD aligns more closely with human preferences and GPT-4 coherence evaluation, outperforming existing methods on three datasets.

Recent large language models (LLMs) have shown remarkable performance in aligning generated text with user intentions across various tasks. When it comes to long-form text generation, there has been a growing interest in generation from a discourse coherence perspective. However, existing lexical or semantic metrics such as BLEU, ROUGE, BertScore cannot effectively capture the discourse coherence. The development of discourse-specific automatic evaluation methods for assessing the output of LLMs warrants greater focus and exploration. In this paper, we present a novel automatic metric designed to quantify the discourse divergence between two long-form articles. Extensive experiments on three datasets from representative domains demonstrate that our metric aligns more closely with human preferences and GPT-4 coherence evaluation, outperforming existing evaluation methods.

View on arXiv PDF Code

Similar