CLAIFeb 9, 2023

Generating a Structured Summary of Numerous Academic Papers: Dataset and Method

arXiv:2302.04580v129 citationsh-index: 72
Originality Incremental advance
AI Analysis

This addresses the need for automated structured summarization of many academic papers, which is incremental as it builds on multi-document summarization but introduces new data and a method for handling long sequences.

The paper tackles the problem of generating structured summaries from numerous academic papers, a gap in existing multi-document summarization datasets and methods, by introducing BigSurvey, a dataset with over 7,000 survey papers and 430,000 reference abstracts, and the CAST method, which outperforms advanced summarization methods.

Writing a survey paper on one research topic usually needs to cover the salient content from numerous related papers, which can be modeled as a multi-document summarization (MDS) task. Existing MDS datasets usually focus on producing the structureless summary covering a few input documents. Meanwhile, previous structured summary generation works focus on summarizing a single document into a multi-section summary. These existing datasets and methods cannot meet the requirements of summarizing numerous academic papers into a structured summary. To deal with the scarcity of available data, we propose BigSurvey, the first large-scale dataset for generating comprehensive summaries of numerous academic papers on each topic. We collect target summaries from more than seven thousand survey papers and utilize their 430 thousand reference papers' abstracts as input documents. To organize the diverse content from dozens of input documents and ensure the efficiency of processing long text sequences, we propose a summarization method named category-based alignment and sparse transformer (CAST). The experimental results show that our CAST method outperforms various advanced summarization methods.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes