CL AIApr 14

CSRP: Chain-of-Thought Reasoning for Chinese Text Correction via Reinforcement Learning with Efficiency-Aware Rewards

arXiv:2606.0002032.9h-index: 3Has Code

Predicted impact top 27% in CL · last 90 daysOriginality Incremental advance

AI Analysis

For researchers and practitioners in Chinese grammatical error correction, CSRP provides a method to improve precision and reduce over-correction, a known bottleneck in the field.

CSRP achieves state-of-the-art Chinese grammatical error correction with 50.99 F0.5 and 57.17 precision on NACGEC, and 59.61 F1 on CSCD spelling correction, surpassing GPT-4 by 5.20 points, by combining continual pre-training, chain-of-thought reasoning, and reinforcement learning with an efficiency-aware reward that reduces over-correction.

Large Language Model (LLM) based Chinese Grammatical Error Correction (CGEC) systems face two critical challenges: general-purpose models lack specialized linguistic priors for subtle grammatical distinctions, and Supervised Fine-Tuning (SFT) with Maximum Likelihood Estimation fails to optimize for precision-focused metrics, leading to systematic over-correction. We propose CSRP, a three-stage framework that progressively builds correction capability through Continual Pre-training (CPT) on 5.9M balanced samples to internalize domain knowledge, Chain-of-Thought SFT with explicit error reasoning for diagnostic transparency, and Group Relative Policy Optimization with a novel Efficiency-Aware Reward that explicitly penalizes unnecessary edits. On the NACGEC benchmark, CSRP achieves state-of-the-art performance with 50.99 $F_{0.5}$ and 57.17 precision, substantially outperforming previous best results while effectively mitigating the over-correction bias inherent in MLE-trained models. Our method also advances CSCD spelling correction to 59.61 F1, surpassing GPT-4 by 5.20 points. Comprehensive ablation studies demonstrate that the RL alignment stage contributes a 8\% relative gain over the SFT baseline, and that this gain is orthogonal to the contribution of large-scale CPT, validating that explicit optimization for edit efficiency is essential for high-quality grammatical error correction. Our code is available at https://github.com/TW-NLP/ChineseErrorCorrector.

View on arXiv PDF Code

Similar