CLAug 16, 2023

RSpell: Retrieval-augmented Framework for Domain Adaptive Chinese Spelling Check

arXiv:2308.08176v16 citationsh-index: 21Has Code
Originality Incremental advance
AI Analysis

This work addresses the challenge of making CSC models effective across different domains, which is crucial for practical applications, though it appears incremental by building on existing CSC methods with retrieval enhancements.

The paper tackles the problem of domain adaptation in Chinese Spelling Check (CSC) by proposing RSpell, a retrieval-augmented framework that incorporates domain-specific terms, achieving state-of-the-art performance in zero-shot and fine-tuning scenarios across law, medicine, and official document domains.

Chinese Spelling Check (CSC) refers to the detection and correction of spelling errors in Chinese texts. In practical application scenarios, it is important to make CSC models have the ability to correct errors across different domains. In this paper, we propose a retrieval-augmented spelling check framework called RSpell, which searches corresponding domain terms and incorporates them into CSC models. Specifically, we employ pinyin fuzzy matching to search for terms, which are combined with the input and fed into the CSC model. Then, we introduce an adaptive process control mechanism to dynamically adjust the impact of external knowledge on the model. Additionally, we develop an iterative strategy for the RSpell framework to enhance reasoning capabilities. We conducted experiments on CSC datasets in three domains: law, medicine, and official document writing. The results demonstrate that RSpell achieves state-of-the-art performance in both zero-shot and fine-tuning scenarios, demonstrating the effectiveness of the retrieval-augmented CSC framework. Our code is available at https://github.com/47777777/Rspell.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes