CLApr 5, 2021

Efficient Attentions for Long Document Summarization

Luyang Huang, Shuyang Cao, Nikolaus Parulian, Heng Ji, Lu Wang

arXiv:2104.02112v234.4792 citationsHas Code

Originality Highly original

AI Analysis

This addresses the computational bottleneck for researchers and practitioners in NLP working with long documents, representing a strong specific gain rather than a foundational breakthrough.

The paper tackles the scalability issue of Transformers for long document summarization by proposing Hepos, an efficient encoder-decoder attention method, enabling processing of ten times more tokens and achieving higher ROUGE scores, including state-of-the-art results on PubMed.

The quadratic computational and memory complexities of large Transformers have limited their scalability for long document summarization. In this paper, we propose Hepos, a novel efficient encoder-decoder attention with head-wise positional strides to effectively pinpoint salient information from the source. We further conduct a systematic study of existing efficient self-attentions. Combined with Hepos, we are able to process ten times more tokens than existing models that use full attentions. For evaluation, we present a new dataset, GovReport, with significantly longer documents and summaries. Results show that our models produce significantly higher ROUGE scores than competitive comparisons, including new state-of-the-art results on PubMed. Human evaluation also shows that our models generate more informative summaries with fewer unfaithful errors.

View on arXiv PDF Code

Similar