Multi-News: a Large-Scale Multi-Document Summarization Dataset and Abstractive Hierarchical Model
This work addresses the problem of summarizing multiple news articles for researchers and practitioners in natural language processing, but it is incremental as it builds on existing summarization methods.
The authors tackled the lack of large datasets for multi-document summarization (MDS) of news articles by introducing Multi-News, the first large-scale MDS dataset, and proposed an end-to-end model combining extractive and single-document summarization techniques, achieving competitive results on MDS benchmarks.
Automatic generation of summaries from multiple news articles is a valuable tool as the number of online publications grows rapidly. Single document summarization (SDS) systems have benefited from advances in neural encoder-decoder model thanks to the availability of large datasets. However, multi-document summarization (MDS) of news articles has been limited to datasets of a couple of hundred examples. In this paper, we introduce Multi-News, the first large-scale MDS news dataset. Additionally, we propose an end-to-end model which incorporates a traditional extractive summarization model with a standard SDS model and achieves competitive results on MDS datasets. We benchmark several methods on Multi-News and release our data and code in hope that this work will promote advances in summarization in the multi-document setting.