CLFeb 15, 2024

Unlocking Structure Measuring: Introducing PDD, an Automatic Metric for Positional Discourse Coherence

Cambridge
arXiv:2402.10175v233 citationsh-index: 19NAACL
Originality Incremental advance
AI Analysis

This addresses the need for better automatic evaluation of discourse coherence in LLM-generated long-form text, which is incremental as it builds on existing discourse coherence research.

The paper tackles the problem of evaluating discourse coherence in long-form text generation by introducing PDD, an automatic metric that quantifies discourse divergence between articles. The results show that PDD aligns more closely with human preferences and GPT-4 coherence evaluation, outperforming existing methods on three datasets.

Recent large language models (LLMs) have shown remarkable performance in aligning generated text with user intentions across various tasks. When it comes to long-form text generation, there has been a growing interest in generation from a discourse coherence perspective. However, existing lexical or semantic metrics such as BLEU, ROUGE, BertScore cannot effectively capture the discourse coherence. The development of discourse-specific automatic evaluation methods for assessing the output of LLMs warrants greater focus and exploration. In this paper, we present a novel automatic metric designed to quantify the discourse divergence between two long-form articles. Extensive experiments on three datasets from representative domains demonstrate that our metric aligns more closely with human preferences and GPT-4 coherence evaluation, outperforming existing evaluation methods.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes