A Supervised Approach to Extractive Summarisation of Scientific Papers
This work addresses the problem of data scarcity for researchers in automatic summarization of scientific documents, though it is incremental as it builds on existing neural and feature-based methods.
The authors tackled the lack of large datasets for summarizing scientific papers by creating a new dataset from author-provided summaries in computer science publications, and they developed models that combine neural sentence encoding with traditional features, achieving significant performance improvements over established baselines.
Automatic summarisation is a popular approach to reduce a document to its main arguments. Recent research in the area has focused on neural approaches to summarisation, which can be very data-hungry. However, few large datasets exist and none for the traditionally popular domain of scientific publications, which opens up challenging research avenues centered on encoding large, complex documents. In this paper, we introduce a new dataset for summarisation of computer science publications by exploiting a large resource of author provided summaries and show straightforward ways of extending it further. We develop models on the dataset making use of both neural sentence encoding and traditionally used summarisation features and show that models which encode sentences as well as their local and global context perform best, significantly outperforming well-established baseline methods.