CLMay 31, 2022

NEWTS: A Corpus for News Topic-Focused Summarization

arXiv:2205.15661v1651 citationsh-index: 26
Originality Synthesis-oriented
AI Analysis

It addresses the lack of datasets for topic-focused summarization, enabling evaluation of models that can condition summaries on specific themes, which is incremental as it builds on existing summarization datasets.

The paper introduces NEWTS, the first corpus for topic-focused summarization, based on CNN/Dailymail and annotated via crowd-sourcing, with each source article paired with two reference summaries focusing on different themes, and evaluates existing techniques and prompting methods.

Text summarization models are approaching human levels of fidelity. Existing benchmarking corpora provide concordant pairs of full and abridged versions of Web, news or, professional content. To date, all summarization datasets operate under a one-size-fits-all paradigm that may not reflect the full range of organic summarization needs. Several recently proposed models (e.g., plug and play language models) have the capacity to condition the generated summaries on a desired range of themes. These capacities remain largely unused and unevaluated as there is no dedicated dataset that would support the task of topic-focused summarization. This paper introduces the first topical summarization corpus NEWTS, based on the well-known CNN/Dailymail dataset, and annotated via online crowd-sourcing. Each source article is paired with two reference summaries, each focusing on a different theme of the source document. We evaluate a representative range of existing techniques and analyze the effectiveness of different prompting methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes