CLMay 18, 2023

CLEME: Debiasing Multi-reference Evaluation for Grammatical Error Correction

arXiv:2305.10819v2132 citations
Originality Incremental advance
AI Analysis

This addresses a specific evaluation problem for GEC researchers, offering an incremental improvement over existing metrics.

The paper tackled bias in multi-reference evaluation for Grammatical Error Correction (GEC) by proposing CLEME, a metric that uses chunk sequences with consistent boundaries to eliminate bias, and experiments on six English reference sets demonstrated its effectiveness and robustness.

Evaluating the performance of Grammatical Error Correction (GEC) systems is a challenging task due to its subjectivity. Designing an evaluation metric that is as objective as possible is crucial to the development of GEC task. However, mainstream evaluation metrics, i.e., reference-based metrics, introduce bias into the multi-reference evaluation by extracting edits without considering the presence of multiple references. To overcome this issue, we propose Chunk-LEvel Multi-reference Evaluation (CLEME), designed to evaluate GEC systems in the multi-reference evaluation setting. CLEME builds chunk sequences with consistent boundaries for the source, the hypothesis and references, thus eliminating the bias caused by inconsistent edit boundaries. Furthermore, we observe the consistent boundary could also act as the boundary of grammatical errors, based on which the F$_{0.5}$ score is then computed following the correction independence assumption. We conduct experiments on six English reference sets based on the CoNLL-2014 shared task. Extensive experiments and detailed analyses demonstrate the correctness of our discovery and the effectiveness of CLEME. Further analysis reveals that CLEME is robust to evaluate GEC systems across reference sets with varying numbers of references and annotation style.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes