CLApr 23, 2024

CASPR: Automated Evaluation Metric for Contrastive Summarization

Nirupan Ananthamurugan, Dat Duong, Philip George, Ankita Gupta, Sandeep Tata, Beliz Gunel

arXiv:2404.15565v21.0h-index: 3Has Code

Originality Incremental advance

AI Analysis

This addresses the need for automated evaluation in contrastive summarization, which aids decision-making, but is incremental as it builds on existing methods.

The paper tackles the problem of automatically evaluating contrastive summarization by proposing CASPR, a metric that uses natural language inference to measure contrast between summary pairs, showing it more reliably captures contrastiveness than prior baselines on the CoCoTRIP dataset.

Summarizing comparative opinions about entities (e.g., hotels, phones) from a set of source reviews, often referred to as contrastive summarization, can considerably aid users in decision making. However, reliably measuring the contrastiveness of the output summaries without relying on human evaluations remains an open problem. Prior work has proposed token-overlap based metrics, Distinctiveness Score, to measure contrast which does not take into account the sensitivity to meaning-preserving lexical variations. In this work, we propose an automated evaluation metric CASPR to better measure contrast between a pair of summaries. Our metric is based on a simple and light-weight method that leverages natural language inference (NLI) task to measure contrast by segmenting reviews into single-claim sentences and carefully aggregating NLI scores between them to come up with a summary-level score. We compare CASPR with Distinctiveness Score and a simple yet powerful baseline based on BERTScore. Our results on a prior dataset CoCoTRIP demonstrate that CASPR can more reliably capture the contrastiveness of the summary pairs compared to the baselines.

View on arXiv PDF Code

Similar