CLAIMay 21, 2023

Teaching the Pre-trained Model to Generate Simple Texts for Text Simplification

arXiv:2305.12463v1224 citations
Originality Incremental advance
AI Analysis

This addresses text simplification for NLP applications, but it is incremental as it builds on existing pre-trained models like BART.

The paper tackles the problem of pre-trained models underperforming on text simplification tasks due to random masking strategies, and proposes SimpleBART, a continued pre-training approach that consistently and significantly improves results on lexical, sentence, and document-level simplification tasks over BART.

Randomly masking text spans in ordinary texts in the pre-training stage hardly allows models to acquire the ability to generate simple texts. It can hurt the performance of pre-trained models on text simplification tasks. In this paper, we propose a new continued pre-training strategy to teach the pre-trained model to generate simple texts. We continue pre-training BART, a representative model, to obtain SimpleBART. It consistently and significantly improves the results on lexical simplification, sentence simplification, and document-level simplification tasks over BART. At the end, we compare SimpleBART with several representative large language models (LLMs).

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes