CLSep 8, 2023

Unsupervised Multi-document Summarization with Holistic Inference

StanfordTencent
arXiv:2309.04087v1125 citationsh-index: 83
Originality Highly original
AI Analysis

This addresses the problem of extracting core information from document collections for users needing concise summaries, representing an incremental advance with a novel method for a known bottleneck.

The paper tackles unsupervised multi-document summarization by proposing a holistic framework with Subset Representative Index (SRI) to balance importance and diversity, achieving significant improvements in ROUGE scores and diversity measures over strong baselines.

Multi-document summarization aims to obtain core information from a collection of documents written on the same topic. This paper proposes a new holistic framework for unsupervised multi-document extractive summarization. Our method incorporates the holistic beam search inference method associated with the holistic measurements, named Subset Representative Index (SRI). SRI balances the importance and diversity of a subset of sentences from the source documents and can be calculated in unsupervised and adaptive manners. To demonstrate the effectiveness of our method, we conduct extensive experiments on both small and large-scale multi-document summarization datasets under both unsupervised and adaptive settings. The proposed method outperforms strong baselines by a significant margin, as indicated by the resulting ROUGE scores and diversity measures. Our findings also suggest that diversity is essential for improving multi-document summary performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes