CLFeb 27, 2024

Fine-Grained Natural Language Inference Based Faithfulness Evaluation for Diverse Summarisation Tasks

Huajian Zhang, Yumo Xu, Laura Perez-Beltrachini

arXiv:2402.17630v127.9112 citationsh-index: 18Has CodeEACL

Originality Incremental advance

AI Analysis

This addresses the need for better faithfulness evaluation in summarization, particularly for long-form and diverse tasks, though it is incremental as it builds on existing NLI-based methods.

The paper tackles the problem of evaluating summary faithfulness using Natural Language Inference (NLI) models, which are suboptimal due to fixed granularity levels, and proposes InFusE with variable premise sizes and simplified hypotheses to achieve superior performance across diverse summarization tasks.

We study existing approaches to leverage off-the-shelf Natural Language Inference (NLI) models for the evaluation of summary faithfulness and argue that these are sub-optimal due to the granularity level considered for premises and hypotheses. That is, the smaller content unit considered as hypothesis is a sentence and premises are made up of a fixed number of document sentences. We propose a novel approach, namely InFusE, that uses a variable premise size and simplifies summary sentences into shorter hypotheses. Departing from previous studies which focus on single short document summarisation, we analyse NLI based faithfulness evaluation for diverse summarisation tasks. We introduce DiverSumm, a new benchmark comprising long form summarisation (long documents and summaries) and diverse summarisation tasks (e.g., meeting and multi-document summarisation). In experiments, InFusE obtains superior performance across the different summarisation tasks. Our code and data are available at https://github.com/HJZnlp/infuse.

View on arXiv PDF Code

Similar