CLDec 16, 2021

QAFactEval: Improved QA-Based Factual Consistency Evaluation for Summarization

Alexander R. Fabbri, Chien-Sheng Wu, Wenhao Liu, Caiming Xiong

arXiv:2112.08542v232.2660 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses the need for reliable factual consistency evaluation in summarization models, which is crucial for practical applications, though it is incremental as it builds on existing QA-based methods.

The authors tackled the problem of evaluating factual consistency in text summarization by proposing QAFactEval, an optimized QA-based metric that achieves a 14% average improvement over previous QA-based metrics on the SummaC benchmark and outperforms the best entailment-based metric.

Factual consistency is an essential quality of text summarization models in practical settings. Existing work in evaluating this dimension can be broadly categorized into two lines of research, entailment-based and question answering (QA)-based metrics, and different experimental setups often lead to contrasting conclusions as to which paradigm performs the best. In this work, we conduct an extensive comparison of entailment and QA-based metrics, demonstrating that carefully choosing the components of a QA-based metric, especially question generation and answerability classification, is critical to performance. Building on those insights, we propose an optimized metric, which we call QAFactEval, that leads to a 14% average improvement over previous QA-based metrics on the SummaC factual consistency benchmark, and also outperforms the best-performing entailment-based metric. Moreover, we find that QA-based and entailment-based metrics can offer complementary signals and be combined into a single metric for a further performance boost.

View on arXiv PDF Code

Similar