Redundancy Aware Multi-Reference Based Gainwise Evaluation of Extractive Summarization
This work addresses the need for better evaluation metrics in extractive summarization by improving upon existing methods to capture both importance and diversity, though it is incremental as it builds on the Sem-nCG metric.
The paper tackled the problem of evaluating extractive summarization by proposing a redundancy-aware version of the Sem-nCG metric that also supports multiple reference summaries, resulting in a stronger correlation with human judgments compared to previous metrics like Sem-nCG, ROUGE, and BERTScore.
The ROUGE metric is commonly used to evaluate extractive summarization task, but it has been criticized for its lack of semantic awareness and its ignorance about the ranking quality of the extractive summarizer. Previous research has introduced a gain-based automated metric called Sem-nCG that addresses these issues, as it is both rank and semantic aware. However, it does not consider the amount of redundancy present in a model summary and currently does not support evaluation with multiple reference summaries. It is essential to have a model summary that balances importance and diversity, but finding a metric that captures both of these aspects is challenging. In this paper, we propose a redundancy-aware Sem-nCG metric and demonstrate how the revised Sem-nCG metric can be used to evaluate model summaries against multiple references as well which was missing in previous research. Experimental results demonstrate that the revised Sem-nCG metric has a stronger correlation with human judgments compared to the previous Sem-nCG metric and traditional ROUGE and BERTScore metric for both single and multiple reference scenarios.