Zero-shot Faithfulness Evaluation for Text Summarization with Foundation Language Model
This addresses the issue of evaluating faithfulness in summarization models for NLP researchers, offering a more efficient alternative to large models like ChatGPT.
The paper tackles the problem of unfaithfulness in text summarization by proposing FFLM, a zero-shot faithfulness evaluation metric using a moderately-sized foundation language model. The result shows that FFLM performs competitively with or outperforms ChatGPT on inconsistency detection and faithfulness rating with 24x fewer parameters.
Despite tremendous improvements in natural language generation, summarization models still suffer from the unfaithfulness issue. Previous work evaluates faithfulness either using models trained on the other tasks or in-domain synthetic data, or prompting a large model such as ChatGPT. This paper proposes to do zero-shot faithfulness evaluation simply with a moderately-sized foundation language model. We introduce a new metric FFLM, which is a combination of probability changes based on the intuition that prefixing a piece of text that is consistent with the output will increase the probability of predicting the output. Experiments show that FFLM performs competitively with or even outperforms ChatGPT on both inconsistency detection and faithfulness rating with 24x fewer parameters. FFLM also achieves improvements over other strong baselines.