Proficiency Matters Quality Estimation in Grammatical Error Correction
This addresses a limitation in real-world applications of grammatical error correction for language learners, but it is incremental as it builds on prior work by focusing on dataset bias.
The study tackled the problem of bias in quality estimation models for grammatical error correction by showing that these models perform differently when evaluated on data from learners with varying proficiency levels, and found that proficiency-wise evaluation leads to more robust models.
This study investigates how supervised quality estimation (QE) models of grammatical error correction (GEC) are affected by the learners' proficiency with the data. QE models for GEC evaluations in prior work have obtained a high correlation with manual evaluations. However, when functioning in a real-world context, the data used for the reported results have limitations because prior works were biased toward data by learners with relatively high proficiency levels. To address this issue, we created a QE dataset that includes multiple proficiency levels and explored the necessity of performing proficiency-wise evaluation for QE of GEC. Our experiments demonstrated that differences in evaluation dataset proficiency affect the performance of QE models, and proficiency-wise evaluation helps create more robust models.