CLMar 24, 2022

Ensembling and Knowledge Distilling of Large Sequence Taggers for Grammatical Error Correction

arXiv:2203.13064v1642 citationsh-index: 8
Originality Incremental advance
AI Analysis

This work addresses grammatical error correction for language processing applications, with incremental improvements through ensembling and dataset generation.

The paper tackles improving grammatical error correction (GEC) by ensembling large Transformer-based encoders and using knowledge distillation to create synthetic datasets, achieving a new SOTA F0.5 score of 76.05 on BEA-2019 and a near-SOTA score of 73.21 with a single model.

In this paper, we investigate improvements to the GEC sequence tagging architecture with a focus on ensembling of recent cutting-edge Transformer-based encoders in Large configurations. We encourage ensembling models by majority votes on span-level edits because this approach is tolerant to the model architecture and vocabulary size. Our best ensemble achieves a new SOTA result with an $F_{0.5}$ score of 76.05 on BEA-2019 (test), even without pre-training on synthetic datasets. In addition, we perform knowledge distillation with a trained ensemble to generate new synthetic training datasets, "Troy-Blogs" and "Troy-1BW". Our best single sequence tagging model that is pretrained on the generated Troy-datasets in combination with the publicly available synthetic PIE dataset achieves a near-SOTA (To the best of our knowledge, our best single model gives way only to much heavier T5 model result with an $F_{0.5}$ score of 73.21 on BEA-2019 (test). The code, datasets, and trained models are publicly available).

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes