An Unsupervised Masking Objective for Abstractive Multi-Document News Summarization
This addresses the problem of generating high-quality summaries from multiple documents without labeled data, offering a novel unsupervised approach that could reduce reliance on costly annotations for news summarization tasks.
The paper tackled abstractive multi-document news summarization by proposing an unsupervised masking objective that predicts masked source documents based on lexical centrality, achieving performance that surpasses past unsupervised methods and, in human evaluation, exceeds the best supervised method without using ground-truth summaries on the Multi-News dataset.
We show that a simple unsupervised masking objective can approach near supervised performance on abstractive multi-document news summarization. Our method trains a state-of-the-art neural summarization model to predict the masked out source document with highest lexical centrality relative to the multi-document group. In experiments on the Multi-News dataset, our masked training objective yields a system that outperforms past unsupervised methods and, in human evaluation, surpasses the best supervised method without requiring access to any ground-truth summaries. Further, we evaluate how different measures of lexical centrality, inspired by past work on extractive summarization, affect final performance.