CLDec 30, 2020

Enhancing Pre-trained Language Model with Lexical Simplification

Rongzhou Bao, Jiayi Wang, Zhuosheng Zhang, Hai Zhao

arXiv:2012.15070v10.31 citationsh-index: 42

Originality Incremental advance

AI Analysis

This work provides an incremental improvement for pre-trained language models in text classification by enhancing their ability to handle lexical diversity.

This paper addresses the problem of lexical diversity causing confusion and inaccuracy in pre-trained language models (PrLMs) by proposing a novel approach that leverages lexical simplification (LS). By applying a rule-based simplification process to sentences and using the simplified versions as auxiliary inputs, the method improves the performance of strong PrLMs like BERT and ELECTRA in various text classification tasks.

For both human readers and pre-trained language models (PrLMs), lexical diversity may lead to confusion and inaccuracy when understanding the underlying semantic meanings of given sentences. By substituting complex words with simple alternatives, lexical simplification (LS) is a recognized method to reduce such lexical diversity, and therefore to improve the understandability of sentences. In this paper, we leverage LS and propose a novel approach which can effectively improve the performance of PrLMs in text classification. A rule-based simplification process is applied to a given sentence. PrLMs are encouraged to predict the real label of the given sentence with auxiliary inputs from the simplified version. Using strong PrLMs (BERT and ELECTRA) as baselines, our approach can still further improve the performance in various text classification tasks.

View on arXiv PDF

Similar