CLSep 1, 2020

Summary-Source Proposition-level Alignment: Task, Datasets and Supervised Baseline

arXiv:2009.00590v2668 citations
Originality Incremental advance
AI Analysis

This work addresses a specific bottleneck in summarization for researchers, though it is incremental as it builds on existing alignment tasks.

The paper tackled the problem of aligning sentences in summaries with source documents by proposing a supervised classification task at the proposition span level, resulting in improved alignment-quality over unsupervised methods.

Aligning sentences in a reference summary with their counterparts in source documents was shown as a useful auxiliary summarization task, notably for generating training data for salience detection. Despite its assessed utility, the alignment step was mostly approached with heuristic unsupervised methods, typically ROUGE-based, and was never independently optimized or evaluated. In this paper, we propose establishing summary-source alignment as an explicit task, while introducing two major novelties: (1) applying it at the more accurate proposition span level, and (2) approaching it as a supervised classification task. To that end, we created a novel training dataset for proposition-level alignment, derived automatically from available summarization evaluation data. In addition, we crowdsourced dev and test datasets, enabling model development and proper evaluation. Utilizing these data, we present a supervised proposition alignment baseline model, showing improved alignment-quality over the unsupervised approach.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes