AbLit: A Resource for Analyzing and Generating Abridged Versions of English Literature
This addresses the problem of automated text shortening for NLP researchers, but it is incremental as it introduces a new dataset and initial models without broad SOTA impact.
The authors tackled the NLP task of text abridgement for the first time by creating AbLit, a dataset of aligned original and abridged English literature passages, and developed models to predict relations and generate abridgements, establishing it as a challenging task.
Creating an abridged version of a text involves shortening it while maintaining its linguistic qualities. In this paper, we examine this task from an NLP perspective for the first time. We present a new resource, AbLit, which is derived from abridged versions of English literature books. The dataset captures passage-level alignments between the original and abridged texts. We characterize the linguistic relations of these alignments, and create automated models to predict these relations as well as to generate abridgements for new texts. Our findings establish abridgement as a challenging task, motivating future resources and research. The dataset is available at github.com/roemmele/AbLit.