A Semantically Motivated Approach to Compute ROUGE Scores
This addresses a key limitation in text summarization evaluation for researchers and practitioners, though it is incremental as it builds on existing ROUGE metrics.
The paper tackled the problem of ROUGE's inability to fairly evaluate abstractive summaries due to reliance on surface similarities, by proposing a semantically motivated approach that incorporates lexical and semantic similarities, resulting in significantly better correlation with human judgments on TAC AESOP datasets.
ROUGE is one of the first and most widely used evaluation metrics for text summarization. However, its assessment merely relies on surface similarities between peer and model summaries. Consequently, ROUGE is unable to fairly evaluate abstractive summaries including lexical variations and paraphrasing. Exploring the effectiveness of lexical resource-based models to address this issue, we adopt a graph-based algorithm into ROUGE to capture the semantic similarities between peer and model summaries. Our semantically motivated approach computes ROUGE scores based on both lexical and semantic similarities. Experiment results over TAC AESOP datasets indicate that exploiting the lexico-semantic similarity of the words used in summaries would significantly help ROUGE correlate better with human judgments.