CLApr 6, 2021

Attention Head Masking for Inference Time Content Selection in Abstractive Summarization

arXiv:2104.02205v1730 citations
AI Analysis

This work addresses content selection in abstractive summarization, offering a data-efficient method that improves performance on key datasets, though it appears incremental as it builds on existing Transformer architectures.

The authors tackled the problem of content selection in Transformer-based abstractive summarization by introducing an attention head masking technique applied at inference time, which outperformed prior state-of-the-art models on CNN/Daily Mail and New York Times datasets and required only 20% of training data to surpass BART fine-tuned on the full dataset.

How can we effectively inform content selection in Transformer-based abstractive summarization models? In this work, we present a simple-yet-effective attention head masking technique, which is applied on encoder-decoder attentions to pinpoint salient content at inference time. Using attention head masking, we are able to reveal the relation between encoder-decoder attentions and content selection behaviors of summarization models. We then demonstrate its effectiveness on three document summarization datasets based on both in-domain and cross-domain settings. Importantly, our models outperform prior state-of-the-art models on CNN/Daily Mail and New York Times datasets. Moreover, our inference-time masking technique is also data-efficient, requiring only 20% of the training samples to outperform BART fine-tuned on the full CNN/DailyMail dataset.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes