A Language Model for Grammatical Error Correction in L2 Russian
This addresses a domain-specific issue for learners and users of Russian as a second language, but appears incremental as it builds on existing language model approaches.
The paper tackles the problem of grammatical error correction for non-native (L2) Russian writing, where existing spellcheckers often fail, by proposing a language model trained on untagged newspaper texts and validating it against the RULEC-GEC corpus, achieving unspecified results.
Grammatical error correction is one of the fundamental tasks in Natural Language Processing. For the Russian language, most of the spellcheckers available correct typos and other simple errors with high accuracy, but often fail when faced with non-native (L2) writing, since the latter contains errors that are not typical for native speakers. In this paper, we propose a pipeline involving a language model intended for correcting errors in L2 Russian writing. The language model proposed is trained on untagged texts of the Newspaper subcorpus of the Russian National Corpus, and the quality of the model is validated against the RULEC-GEC corpus.