LGMLMay 11, 2020

Hierarchical Attention Transformer Architecture For Syntactic Spell Correction

arXiv:2005.04876v1
Originality Incremental advance
AI Analysis

This work addresses the need for a reliable and fast post-processing textual module for mobile phone applications, offering incremental improvements in spell correction.

The paper tackles the spell correction problem by proposing a hierarchical attention transformer architecture, achieving improvements of 0.11%, 0.32%, and 0.69% in character, word, and sentence error rates over existing state-of-the-art models, while also training 7.8 times faster and being one-third the size.

The attention mechanisms are playing a boosting role in advancements in sequence-to-sequence problems. Transformer architecture achieved new state of the art results in machine translation, and it's variants are since being introduced in several other sequence-to-sequence problems. Problems which involve a shared vocabulary, can benefit from the similar semantic and syntactic structure in the source and target sentences. With the motivation of building a reliable and fast post-processing textual module to assist all the text-related use cases in mobile phones, we take on the popular spell correction problem. In this paper, we propose multi encoder-single decoder variation of conventional transformer. Outputs from the three encoders with character level 1-gram, 2-grams and 3-grams inputs are attended in hierarchical fashion in the decoder. The context vectors from the encoders clubbed with self-attention amplify the n-gram properties at the character level and helps in accurate decoding. We demonstrate our model on spell correction dataset from Samsung Research, and report significant improvement of 0.11\%, 0.32\% and 0.69\% in character (CER), word (WER) and sentence (SER) error rates from existing state-of-the-art machine-translation architectures. Our architecture is also trains ~7.8 times faster, and is only about 1/3 in size from the next most accurate model.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes