Improving abstractive summarization with energy-based re-ranking
This work addresses the problem of unreliable abstractive summarization for real-world applications, though it is incremental as it builds on existing metrics and re-ranking techniques.
The paper tackled weaknesses in abstractive summarization, such as omissions and factual inconsistencies, by proposing an energy-based model to re-rank summaries using improved automatic metrics, which consistently improved automatic scores but showed limitations in human evaluation for highly abstractive summaries.
Current abstractive summarization systems present important weaknesses which prevent their deployment in real-world applications, such as the omission of relevant information and the generation of factual inconsistencies (also known as hallucinations). At the same time, automatic evaluation metrics such as CTC scores have been recently proposed that exhibit a higher correlation with human judgments than traditional lexical-overlap metrics such as ROUGE. In this work, we intend to close the loop by leveraging the recent advances in summarization metrics to create quality-aware abstractive summarizers. Namely, we propose an energy-based model that learns to re-rank summaries according to one or a combination of these metrics. We experiment using several metrics to train our energy-based re-ranker and show that it consistently improves the scores achieved by the predicted summaries. Nonetheless, human evaluation results show that the re-ranking approach should be used with care for highly abstractive summaries, as the available metrics are not yet sufficiently reliable for this purpose.