Hierarchical Character Tagger for Short Text Spelling Error Correction
This addresses spelling correction for short texts, offering a more efficient alternative to existing methods, though it is incremental in its improvements.
The paper tackles the problem of spelling error correction in short texts by introducing HCTagger, a character-level model that predicts edits to transform misspelled text into error-free form, achieving faster inference and competitive accuracy on public datasets.
State-of-the-art approaches to spelling error correction problem include Transformer-based Seq2Seq models, which require large training sets and suffer from slow inference time; and sequence labeling models based on Transformer encoders like BERT, which involve token-level label space and therefore a large pre-defined vocabulary dictionary. In this paper we present a Hierarchical Character Tagger model, or HCTagger, for short text spelling error correction. We use a pre-trained language model at the character level as a text encoder, and then predict character-level edits to transform the original text into its error-free form with a much smaller label space. For decoding, we propose a hierarchical multi-task approach to alleviate the issue of long-tail label distribution without introducing extra model parameters. Experiments on two public misspelling correction datasets demonstrate that HCTagger is an accurate and much faster approach than many existing models.