Abstractive Summarization with Combination of Pre-trained Sequence-to-Sequence and Saliency Models
This work addresses the challenge of generating accurate summaries by enhancing model focus on key information, representing an incremental improvement in natural language processing for summarization tasks.
The study tackled the problem of abstractive summarization by combining pre-trained sequence-to-sequence models with saliency models to better identify important parts of source texts, resulting in performance improvements, including exceeding the previous best model by 1.33 points on ROUGE-L for the CNN/DM dataset.
Pre-trained sequence-to-sequence (seq-to-seq) models have significantly improved the accuracy of several language generation tasks, including abstractive summarization. Although the fluency of abstractive summarization has been greatly improved by fine-tuning these models, it is not clear whether they can also identify the important parts of the source text to be included in the summary. In this study, we investigated the effectiveness of combining saliency models that identify the important parts of the source text with the pre-trained seq-to-seq models through extensive experiments. We also proposed a new combination model consisting of a saliency model that extracts a token sequence from a source text and a seq-to-seq model that takes the sequence as an additional input text. Experimental results showed that most of the combination models outperformed a simple fine-tuned seq-to-seq model on both the CNN/DM and XSum datasets even if the seq-to-seq model is pre-trained on large-scale corpora. Moreover, for the CNN/DM dataset, the proposed combination model exceeded the previous best-performed model by 1.33 points on ROUGE-L.