CLJul 10, 2020

SacreROUGE: An Open-Source Library for Using and Developing Summarization Evaluation Metrics

arXiv:2007.05374v131.1998 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of inconsistent and cumbersome evaluation processes for summarization metrics, primarily benefiting NLP researchers, but it is incremental as it builds on existing metrics and datasets.

The authors tackled the challenge of evaluating summarization metrics by introducing SacreROUGE, an open-source library that standardizes interfaces for existing metrics, automates correlation testing with human judgments, and simplifies dataset loading, resulting in a tool that reduces coding effort for researchers.

We present SacreROUGE, an open-source library for using and developing summarization evaluation metrics. SacreROUGE removes many obstacles that researchers face when using or developing metrics: (1) The library provides Python wrappers around the official implementations of existing evaluation metrics so they share a common, easy-to-use interface; (2) it provides functionality to evaluate how well any metric implemented in the library correlates to human-annotated judgments, so no additional code needs to be written for a new evaluation metric; and (3) it includes scripts for loading datasets that contain human judgments so they can easily be used for evaluation. This work describes the design of the library, including the core Metric interface, the command-line API for evaluating summarization models and metrics, and the scripts to load and reformat publicly available datasets. The development of SacreROUGE is ongoing and open to contributions from the community.

View on arXiv PDF Code

Similar