CLSep 1, 2020

Summary-Source Proposition-level Alignment: Task, Datasets and Supervised Baseline

Ori Ernst, Ori Shapira, Ramakanth Pasunuru, Michael Lepioshkin, Jacob Goldberger, Mohit Bansal, Ido Dagan

arXiv:2009.00590v227.1668 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses a specific bottleneck in summarization for researchers, though it is incremental as it builds on existing alignment tasks.

The paper tackled the problem of aligning sentences in summaries with source documents by proposing a supervised classification task at the proposition span level, resulting in improved alignment-quality over unsupervised methods.

Aligning sentences in a reference summary with their counterparts in source documents was shown as a useful auxiliary summarization task, notably for generating training data for salience detection. Despite its assessed utility, the alignment step was mostly approached with heuristic unsupervised methods, typically ROUGE-based, and was never independently optimized or evaluated. In this paper, we propose establishing summary-source alignment as an explicit task, while introducing two major novelties: (1) applying it at the more accurate proposition span level, and (2) approaching it as a supervised classification task. To that end, we created a novel training dataset for proposition-level alignment, derived automatically from available summarization evaluation data. In addition, we crowdsourced dev and test datasets, enabling model development and proper evaluation. Utilizing these data, we present a supervised proposition alignment baseline model, showing improved alignment-quality over the unsupervised approach.

View on arXiv PDF Code

Similar