CLAIOct 31, 2022

SDCL: Self-Distillation Contrastive Learning for Chinese Spell Checking

arXiv:2210.17168v45 citationsh-index: 24
Originality Incremental advance
AI Analysis

This addresses Chinese spell checking for users dealing with homophone ambiguity, representing an incremental improvement over existing BERT-based approaches.

The paper tackles Chinese Spell Checking by proposing a token-level self-distillation contrastive learning method to adapt BERT for handling phonetic and graphemic information, achieving significant improvements on three datasets.

Due to the ambiguity of homophones, Chinese Spell Checking (CSC) has widespread applications. Existing systems typically utilize BERT for text encoding. However, CSC requires the model to account for both phonetic and graphemic information. To adapt BERT to the CSC task, we propose a token-level self-distillation contrastive learning method. We employ BERT to encode both the corrupted and corresponding correct sentence. Then, we use contrastive learning loss to regularize corrupted tokens' hidden states to be closer to counterparts in the correct sentence. On three CSC datasets, we confirmed our method provides a significant improvement above baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes