Some Grammatical Errors are Frequent, Others are Important
This addresses the issue for natural language processing researchers and practitioners by revealing that current evaluation metrics may not align with human perceptions, though it is incremental as it builds on existing error correction frameworks.
The paper tackled the problem of evaluating grammatical error correction systems by quantifying the importance of different error types to humans, showing that rare errors can be more disturbing than common ones, which impacts system improvement and evaluation.
In Grammatical Error Correction, systems are evaluated by the number of errors they correct. However, no one has assessed whether all error types are equally important. We provide and apply a method to quantify the importance of different grammatical error types to humans. We show that some rare errors are considered disturbing while other common ones are not. This affects possible directions to improve both systems and their evaluation.