CLLGOct 5, 2025

Time Is Effort: Estimating Human Post-Editing Time for Grammar Error Correction Tool Evaluation

arXiv:2510.04394v11 citationsh-index: 5Has CodeProceedings of the Fourth Workshop on Bridging Human-Computer Interaction and Natural Language Processing (HCI+NLP)
Originality Incremental advance
AI Analysis

This provides a human-centric evaluation method for GEC tools, addressing usability for writers and editors, though it is incremental in focusing on time estimation rather than fundamentally new correction approaches.

The researchers tackled the problem of evaluating grammar error correction (GEC) tools by quantifying how much editing effort they save users, introducing a new dataset and metric called PEET that estimates post-editing time, showing it correlates well with human judgments.

Text editing can involve several iterations of revision. Incorporating an efficient Grammar Error Correction (GEC) tool in the initial correction round can significantly impact further human editing effort and final text quality. This raises an interesting question to quantify GEC Tool usability: How much effort can the GEC Tool save users? We present the first large-scale dataset of post-editing (PE) time annotations and corrections for two English GEC test datasets (BEA19 and CoNLL14). We introduce Post-Editing Effort in Time (PEET) for GEC Tools as a human-focused evaluation scorer to rank any GEC Tool by estimating PE time-to-correct. Using our dataset, we quantify the amount of time saved by GEC Tools in text editing. Analyzing the edit type indicated that determining whether a sentence needs correction and edits like paraphrasing and punctuation changes had the greatest impact on PE time. Finally, comparison with human rankings shows that PEET correlates well with technical effort judgment, providing a new human-centric direction for evaluating GEC tool usability. We release our dataset and code at: https://github.com/ankitvad/PEET_Scorer.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes