CLMar 8, 2024

ROUGE-K: Do Your Summaries Have Keywords?

arXiv:2403.05186v127 citationsh-index: 12Has CodeSTARSEM
Originality Incremental advance
AI Analysis

This addresses a previously overlooked aspect in summarization evaluation, providing a tool for developers to assess keyword inclusion, though it is incremental as it builds on existing metrics like ROUGE.

The paper tackles the problem of evaluating keyword inclusion in extreme summarization by introducing ROUGE-K, a metric that quantifies how well summaries contain keywords, and finds that a strong baseline model often misses essential information, with human annotators confirming that summaries with more keywords are more relevant.

Keywords, that is, content-relevant words in summaries play an important role in efficient information conveyance, making it critical to assess if system-generated summaries contain such informative words during evaluation. However, existing evaluation metrics for extreme summarization models do not pay explicit attention to keywords in summaries, leaving developers ignorant of their presence. To address this issue, we present a keyword-oriented evaluation metric, dubbed ROUGE-K, which provides a quantitative answer to the question of -- \textit{How well do summaries include keywords?} Through the lens of this keyword-aware metric, we surprisingly find that a current strong baseline model often misses essential information in their summaries. Our analysis reveals that human annotators indeed find the summaries with more keywords to be more relevant to the source documents. This is an important yet previously overlooked aspect in evaluating summarization systems. Finally, to enhance keyword inclusion, we propose four approaches for incorporating word importance into a transformer-based model and experimentally show that it enables guiding models to include more keywords while keeping the overall quality. Our code is released at https://github.com/sobamchan/rougek.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes