CLAIFeb 10, 2025

Discourse-Driven Evaluation: Unveiling Factual Inconsistency in Long Document Summarization

arXiv:2502.06185v112 citationsh-index: 4NAACL
Originality Highly original
AI Analysis

This work addresses the problem of factual inconsistency in long document summarization, which is significant for applications requiring accurate summarization of lengthy texts, and presents an incremental improvement over existing methods.

The authors tackled the problem of detecting factual inconsistency in long document summarization, finding that errors are more common in complex sentences and are associated with discourse features, and achieved improved performance on several evaluation benchmarks. Their approach showed better results compared to different model baselines.

Detecting factual inconsistency for long document summarization remains challenging, given the complex structure of the source article and long summary length. In this work, we study factual inconsistency errors and connect them with a line of discourse analysis. We find that errors are more common in complex sentences and are associated with several discourse features. We propose a framework that decomposes long texts into discourse-inspired chunks and utilizes discourse information to better aggregate sentence-level scores predicted by natural language inference models. Our approach shows improved performance on top of different model baselines over several evaluation benchmarks, covering rich domains of texts, focusing on long document summarization. This underscores the significance of incorporating discourse features in developing models for scoring summaries for long document factual inconsistency.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes