Event-based evaluation of abstractive news summarization
This work addresses the evaluation of abstractive summarization for news articles, offering a more nuanced approach than traditional methods, but it is incremental as it builds on existing evaluation frameworks with a specific focus on events.
The paper tackled the problem of evaluating abstractive news summaries by proposing an event-based evaluation method that calculates overlapping events between generated summaries, reference summaries, and original articles, using a richly annotated Norwegian dataset with expert human annotations to provide more insight into event information.
An abstractive summary of a news article contains its most important information in a condensed version. The evaluation of automatically generated summaries by generative language models relies heavily on human-authored summaries as gold references, by calculating overlapping units or similarity scores. News articles report events, and ideally so should the summaries. In this work, we propose to evaluate the quality of abstractive summaries by calculating overlapping events between generated summaries, reference summaries, and the original news articles. We experiment on a richly annotated Norwegian dataset comprising both events annotations and summaries authored by expert human annotators. Our approach provides more insight into the event information contained in the summaries.