CLJun 4, 2019

TalkSumm: A Dataset and Scalable Annotation Method for Scientific Paper Summarization Based on Conference Talks

Guy Lev, Michal Shmueli-Scheuer, Jonathan Herzig, Achiya Jerbi, David Konopnicki

arXiv:1906.01351v231.31107 citationsHas Code

Originality Incremental advance

AI Analysis

This provides a scalable solution for generating training data in scientific summarization, though it is incremental as it builds on existing summarization methods.

The authors tackled the lack of large-scale training data for scientific paper summarization by automatically generating summaries from conference talk videos, creating a dataset of 1716 papers and achieving performance similar to manually created summaries.

Currently, no large-scale training data is available for the task of scientific paper summarization. In this paper, we propose a novel method that automatically generates summaries for scientific papers, by utilizing videos of talks at scientific conferences. We hypothesize that such talks constitute a coherent and concise description of the papers' content, and can form the basis for good summaries. We collected 1716 papers and their corresponding videos, and created a dataset of paper summaries. A model trained on this dataset achieves similar performance as models trained on a dataset of summaries created manually. In addition, we validated the quality of our summaries by human experts.

View on arXiv PDF Code

Similar