CL IRMay 1, 2022

Large-Scale Multi-Document Summarization with Information Extraction and Compression

arXiv:2205.00548v10.33 citationsh-index: 43

Originality Incremental advance

AI Analysis

This addresses summarization of diverse document collections, though it appears incremental with enhancements to existing methods.

The paper tackles multi-document summarization for heterogeneous documents (different stories rather than same topic) without labeled data, and shows their framework outperforms state-of-the-art methods in this generic setting.

We develop an abstractive summarization framework independent of labeled data for multiple heterogeneous documents. Unlike existing multi-document summarization methods, our framework processes documents telling different stories instead of documents on the same topic. We also enhance an existing sentence fusion method with a uni-directional language model to prioritize fused sentences with higher sentence probability with the goal of increasing readability. Lastly, we construct a total of twelve dataset variations based on CNN/Daily Mail and the NewsRoom datasets, where each document group contains a large and diverse collection of documents to evaluate the performance of our model in comparison with other baseline systems. Our experiments demonstrate that our framework outperforms current state-of-the-art methods in this more generic setting.

View on arXiv PDF

Similar