SurveySum: A Dataset for Summarizing Multiple Scientific Articles into a Survey Section
This addresses a gap in summarization tools for researchers needing to compile scientific surveys, though it is incremental as it builds on existing summarization methods.
The paper tackles the problem of summarizing multiple scientific articles into survey sections by introducing SurveySum, a new dataset for domain-specific summarization, and evaluates two pipelines with results emphasizing the importance of retrieval stages and configuration impacts on summary quality.
Document summarization is a task to shorten texts into concise and informative summaries. This paper introduces a novel dataset designed for summarizing multiple scientific articles into a section of a survey. Our contributions are: (1) SurveySum, a new dataset addressing the gap in domain-specific summarization tools; (2) two specific pipelines to summarize scientific articles into a section of a survey; and (3) the evaluation of these pipelines using multiple metrics to compare their performance. Our results highlight the importance of high-quality retrieval stages and the impact of different configurations on the quality of generated summaries.