Revisiting text decomposition methods for NLI-based factuality scoring of summaries
This work addresses the problem of factuality scoring for summaries, providing insights for NLP researchers, but it is incremental as it builds on existing methods.
The paper systematically compares text decomposition granularities for NLI-based factuality scoring of summaries, finding that fine-grained decomposition is not always beneficial and that performance varies across datasets, with small methodological changes improving results.
Scoring the factuality of a generated summary involves measuring the degree to which a target text contains factual information using the input document as support. Given the similarities in the problem formulation, previous work has shown that Natural Language Inference models can be effectively repurposed to perform this task. As these models are trained to score entailment at a sentence level, several recent studies have shown that decomposing either the input document or the summary into sentences helps with factuality scoring. But is fine-grained decomposition always a winning strategy? In this paper we systematically compare different granularities of decomposition -- from document to sub-sentence level, and we show that the answer is no. Our results show that incorporating additional context can yield improvement, but that this does not necessarily apply to all datasets. We also show that small changes to previously proposed entailment-based scoring methods can result in better performance, highlighting the need for caution in model and methodology selection for downstream tasks.