CLApr 20, 2021

Grammatical Error Generation Based on Translated Fragments

Eetu Sjöblom, Mathias Creutz, Teemu Vahtola

arXiv:2104.09933v131.6725 citations

Originality Incremental advance

AI Analysis

This work addresses the need for diverse synthetic data to improve grammatical error correction systems, particularly for non-native English learners, though it is incremental as it builds on existing data creation methods.

The paper tackles the problem of generating synthetic training data for English grammatical error correction by using neural machine translation of sentence fragments to simulate second language learner mistakes, resulting in a model that outperforms a baseline on error-heavy test data.

We perform neural machine translation of sentence fragments in order to create large amounts of training data for English grammatical error correction. Our method aims at simulating mistakes made by second language learners, and produces a wider range of non-native style language in comparison to state-of-the-art synthetic data creation methods. In addition to purely grammatical errors, our approach generates other types of errors, such as lexical errors. We perform grammatical error correction experiments using neural sequence-to-sequence models, and carry out quantitative and qualitative evaluation. A model trained on data created using our proposed method is shown to outperform a baseline model on test data with a high proportion of errors.

View on arXiv PDF

Similar