Extraction of Salient Sentences from Labelled Documents
This work addresses the need for efficient and automated sentence extraction in document analysis, though it appears incremental as it builds on existing methods.
The paper tackles the problem of extracting topic-relevant sentences from labeled documents by developing a hierarchical convolutional document model that enables introspection of document structure, and it introduces a scalable evaluation technique to avoid human annotation.
We present a hierarchical convolutional document model with an architecture designed to support introspection of the document structure. Using this model, we show how to use visualisation techniques from the computer vision literature to identify and extract topic-relevant sentences. We also introduce a new scalable evaluation technique for automatic sentence extraction systems that avoids the need for time consuming human annotation of validation data.