CLJun 4, 2019

TalkSumm: A Dataset and Scalable Annotation Method for Scientific Paper Summarization Based on Conference Talks

arXiv:1906.01351v21107 citations
AI Analysis

This provides a scalable solution for generating training data in scientific summarization, though it is incremental as it builds on existing summarization methods.

The authors tackled the lack of large-scale training data for scientific paper summarization by automatically generating summaries from conference talk videos, creating a dataset of 1716 papers and achieving performance similar to manually created summaries.

Currently, no large-scale training data is available for the task of scientific paper summarization. In this paper, we propose a novel method that automatically generates summaries for scientific papers, by utilizing videos of talks at scientific conferences. We hypothesize that such talks constitute a coherent and concise description of the papers' content, and can form the basis for good summaries. We collected 1716 papers and their corresponding videos, and created a dataset of paper summaries. A model trained on this dataset achieves similar performance as models trained on a dataset of summaries created manually. In addition, we validated the quality of our summaries by human experts.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes