CL AIOct 31, 2022

SDCL: Self-Distillation Contrastive Learning for Chinese Spell Checking

Xiaotian Zhang, Hang Yan, Yu Sun, Xipeng Qiu

arXiv:2210.17168v41.45 citationsh-index: 24

Originality Incremental advance

AI Analysis

This addresses Chinese spell checking for users dealing with homophone ambiguity, representing an incremental improvement over existing BERT-based approaches.

The paper tackles Chinese Spell Checking by proposing a token-level self-distillation contrastive learning method to adapt BERT for handling phonetic and graphemic information, achieving significant improvements on three datasets.

Due to the ambiguity of homophones, Chinese Spell Checking (CSC) has widespread applications. Existing systems typically utilize BERT for text encoding. However, CSC requires the model to account for both phonetic and graphemic information. To adapt BERT to the CSC task, we propose a token-level self-distillation contrastive learning method. We employ BERT to encode both the corrupted and corresponding correct sentence. Then, we use contrastive learning loss to regularize corrupted tokens' hidden states to be closer to counterparts in the correct sentence. On three CSC datasets, we confirmed our method provides a significant improvement above baselines.

View on arXiv PDF

Similar