CLJun 26, 2021

A Training-free and Reference-free Summarization Evaluation Metric via Centrality-weighted Relevance and Self-referenced Redundancy

arXiv:2106.13945v1716 citations
Originality Incremental advance
AI Analysis

This addresses the need for efficient and cost-effective summarization evaluation metrics for NLP researchers and practitioners, though it is incremental as it builds on existing unsupervised evaluation concepts.

The paper tackles the problem of evaluating summarization without costly human references or training data by proposing a metric based on centrality-weighted relevance and self-referenced redundancy, achieving significant outperformance over existing methods on multi-document and single-document summarization evaluation.

In recent years, reference-based and supervised summarization evaluation metrics have been widely explored. However, collecting human-annotated references and ratings are costly and time-consuming. To avoid these limitations, we propose a training-free and reference-free summarization evaluation metric. Our metric consists of a centrality-weighted relevance score and a self-referenced redundancy score. The relevance score is computed between the pseudo reference built from the source document and the given summary, where the pseudo reference content is weighted by the sentence centrality to provide importance guidance. Besides an $F_1$-based relevance score, we also design an $F_β$-based variant that pays more attention to the recall score. As for the redundancy score of the summary, we compute a self-masked similarity score with the summary itself to evaluate the redundant information in the summary. Finally, we combine the relevance and redundancy scores to produce the final evaluation score of the given summary. Extensive experiments show that our methods can significantly outperform existing methods on both multi-document and single-document summarization evaluation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes