Scoring Edit Impact in Grammatical Error Correction via Embedded Association Graphs
This addresses the need for scalable evaluation in GEC, where multiple valid corrections exist, but it is incremental as it builds on existing meta-evaluation approaches.
The paper tackles the problem of automatically scoring the importance of edits in Grammatical Error Correction (GEC) by proposing a new task and a framework based on embedded association graphs, which outperforms baselines across multiple datasets, languages, and systems.
A Grammatical Error Correction (GEC) system produces a sequence of edits to correct an erroneous sentence. The quality of these edits is typically evaluated against human annotations. However, a sentence may admit multiple valid corrections, and existing evaluation settings do not fully accommodate diverse application scenarios. Recent meta-evaluation approaches rely on human judgments across multiple references, but they are difficult to scale to large datasets. In this paper, we propose a new task, Scoring Edit Impact in GEC, which aims to automatically estimate the importance of edits produced by a GEC system. To address this task, we introduce a scoring framework based on an embedded association graph. The graph captures latent dependencies among edits and syntactically related edits, grouping them into coherent groups. We then perform perplexity-based scoring to estimate each edit's contribution to sentence fluency. Experiments across 4 GEC datasets, 4 languages, and 4 GEC systems demonstrate that our method consistently outperforms a range of baselines. Further analysis shows that the embedded association graph effectively captures cross-linguistic structural dependencies among edits.