AIOct 9, 2023

Abstractive Summarization of Large Document Collections Using GPT

arXiv:2310.05690v115 citationsh-index: 2
Originality Incremental advance
AI Analysis

This addresses the problem of summarizing large document collections for data analysis, but it is incremental as it builds on existing methods with modest performance gains.

The paper tackles abstractive summarization for large document collections by combining semantic clustering, GPT-based summarization, and visualization, achieving statistically equivalent performance to BART and PEGASUS on CNN/Daily Mail and Gigaword datasets.

This paper proposes a method of abstractive summarization designed to scale to document collections instead of individual documents. Our approach applies a combination of semantic clustering, document size reduction within topic clusters, semantic chunking of a cluster's documents, GPT-based summarization and concatenation, and a combined sentiment and text visualization of each topic to support exploratory data analysis. Statistical comparison of our results to existing state-of-the-art systems BART, BRIO, PEGASUS, and MoCa using ROGUE summary scores showed statistically equivalent performance with BART and PEGASUS on the CNN/Daily Mail test dataset, and with BART on the Gigaword test dataset. This finding is promising since we view document collection summarization as more challenging than individual document summarization. We conclude with a discussion of how issues of scale are

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes