CL AIDec 3, 2021

The Influence of Data Pre-processing and Post-processing on Long Document Summarization

Xinwei Du, Kailun Dong, Yuchen Zhang, Yongsheng Li, Ruei-Yu Tsay

arXiv:2112.01660v10.2

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of improving summarization quality for NLP practitioners, but it is incremental as it builds on existing models without introducing new paradigms.

The paper tackled the problem of long document summarization by investigating the impact of data pre-processing and post-processing methods on model performance, finding that these methods can improve results, though specific numbers are not provided.

Long document summarization is an important and hard task in the field of natural language processing. A good performance of the long document summarization reveals the model has a decent understanding of the human language. Currently, most researches focus on how to modify the attention mechanism of the transformer to achieve a higher ROUGE score. The study of data pre-processing and post-processing are relatively few. In this paper, we use two pre-processing methods and a post-processing method and analyze the effect of these methods on various long document summarization models.

View on arXiv PDF

Similar