GECToR -- Grammatical Error Correction: Tag, Not Rewrite
This provides a more efficient solution for grammatical error correction, particularly for applications requiring fast processing, though it is incremental as it builds on existing tagging methods.
The paper tackles grammatical error correction by proposing a sequence tagging approach instead of rewriting, achieving an F0.5 score of 65.3/66.5 on CoNLL-2014 and 72.4/73.6 on BEA-2019, with inference speeds up to 10 times faster than Transformer-based seq2seq systems.
In this paper, we present a simple and efficient GEC sequence tagger using a Transformer encoder. Our system is pre-trained on synthetic data and then fine-tuned in two stages: first on errorful corpora, and second on a combination of errorful and error-free parallel corpora. We design custom token-level transformations to map input tokens to target corrections. Our best single-model/ensemble GEC tagger achieves an $F_{0.5}$ of 65.3/66.5 on CoNLL-2014 (test) and $F_{0.5}$ of 72.4/73.6 on BEA-2019 (test). Its inference speed is up to 10 times as fast as a Transformer-based seq2seq GEC system. The code and trained models are publicly available.