CL AIMay 21, 2023

Teaching the Pre-trained Model to Generate Simple Texts for Text Simplification

arXiv:2305.12463v126.6224 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses text simplification for NLP applications, but it is incremental as it builds on existing pre-trained models like BART.

The paper tackles the problem of pre-trained models underperforming on text simplification tasks due to random masking strategies, and proposes SimpleBART, a continued pre-training approach that consistently and significantly improves results on lexical, sentence, and document-level simplification tasks over BART.

Randomly masking text spans in ordinary texts in the pre-training stage hardly allows models to acquire the ability to generate simple texts. It can hurt the performance of pre-trained models on text simplification tasks. In this paper, we propose a new continued pre-training strategy to teach the pre-trained model to generate simple texts. We continue pre-training BART, a representative model, to obtain SimpleBART. It consistently and significantly improves the results on lexical simplification, sentence simplification, and document-level simplification tasks over BART. At the end, we compare SimpleBART with several representative large language models (LLMs).

View on arXiv PDF Code

Similar