CLJul 21, 2019

The Unbearable Weight of Generating Artificial Errors for Grammatical Error Correction

arXiv:1907.08889v11091 citations
Originality Synthesis-oriented
AI Analysis

This addresses the problem of data scarcity in grammatical error correction for language learners and educators, but it is incremental as it builds on existing artificial corpus generation methods.

The paper investigates using neural models to generate artificial grammatical errors for training grammatical error correction systems, finding that this approach can reduce reliance on expensive human-annotated data.

In recent years, sequence-to-sequence models have been very effective for end-to-end grammatical error correction (GEC). As creating human-annotated parallel corpus for GEC is expensive and time-consuming, there has been work on artificial corpus generation with the aim of creating sentences that contain realistic grammatical errors from grammatically correct sentences. In this paper, we investigate the impact of using recent neural models for generating errors to help neural models to correct errors. We conduct a battery of experiments on the effect of data size, models, and comparison with a rule-based approach.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes