CLAILGMay 13, 2022

Improving Contextual Representation with Gloss Regularized Pre-training

arXiv:2205.06603v1628 citationsh-index: 21
Originality Incremental advance
AI Analysis

This work addresses a specific bottleneck in NLP pre-training for better contextual representations, with incremental improvements over existing methods.

The paper tackles the discrepancy between pre-training and inference in BERT-like models by proposing a gloss regularizer module (GR-BERT) to enhance word semantic similarity, achieving new state-of-the-art in lexical substitution and improving sentence representation in STS tasks.

Though achieving impressive results on many NLP tasks, the BERT-like masked language models (MLM) encounter the discrepancy between pre-training and inference. In light of this gap, we investigate the contextual representation of pre-training and inference from the perspective of word probability distribution. We discover that BERT risks neglecting the contextual word similarity in pre-training. To tackle this issue, we propose an auxiliary gloss regularizer module to BERT pre-training (GR-BERT), to enhance word semantic similarity. By predicting masked words and aligning contextual embeddings to corresponding glosses simultaneously, the word similarity can be explicitly modeled. We design two architectures for GR-BERT and evaluate our model in downstream tasks. Experimental results show that the gloss regularizer benefits BERT in word-level and sentence-level semantic representation. The GR-BERT achieves new state-of-the-art in lexical substitution task and greatly promotes BERT sentence representation in both unsupervised and supervised STS tasks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes