CLMay 25, 2022

Counterfactual Data Augmentation improves Factuality of Abstractive Summarization

CMU
arXiv:2205.12416v110 citationsh-index: 98
Originality Incremental advance
AI Analysis

This addresses the issue of factual errors in summarization for users relying on accurate information, but it is incremental as it builds on existing augmentation methods.

The paper tackles the problem of factually inconsistent sentences in abstractive summarization systems by introducing a counterfactual data augmentation approach, which improves factual correctness by about 2.5 points on average on CNN/Dailymail and XSum datasets without significantly affecting ROUGE scores.

Abstractive summarization systems based on pretrained language models often generate coherent but factually inconsistent sentences. In this paper, we present a counterfactual data augmentation approach where we augment data with perturbed summaries that increase the training data diversity. Specifically, we present three augmentation approaches based on replacing (i) entities from other and the same category and (ii) nouns with their corresponding WordNet hypernyms. We show that augmenting the training data with our approach improves the factual correctness of summaries without significantly affecting the ROUGE score. We show that in two commonly used summarization datasets (CNN/Dailymail and XSum), we improve the factual correctness by about 2.5 points on average

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes