CLAIMay 2, 2018

KNPTC: Knowledge and Neural Machine Translation Powered Chinese Pinyin Typo Correction

arXiv:1805.00741v11 citations
Originality Incremental advance
AI Analysis

This addresses a practical issue for Chinese language users on mobile devices, offering a significant improvement over existing methods.

The paper tackles the problem of correcting typos in Chinese pinyin input by proposing KNPTC, a neural machine translation-based approach that integrates explicit knowledge from user typing behaviors, achieving a 32.77% average increase in accuracy compared to the state-of-the-art system.

Chinese pinyin input methods are very important for Chinese language processing. Actually, users may make typos inevitably when they input pinyin. Moreover, pinyin typo correction has become an increasingly important task with the popularity of smartphones and the mobile Internet. How to exploit the knowledge of users typing behaviors and support the typo correction for acronym pinyin remains a challenging problem. To tackle these challenges, we propose KNPTC, a novel approach based on neural machine translation (NMT). In contrast to previous work, KNPTC is able to integrate explicit knowledge into NMT for pinyin typo correction, and is able to learn to correct a variety of typos without the guidance of manually selected constraints or languagespecific features. In this approach, we first obtain the transition probabilities between adjacent letters based on large-scale real-life datasets. Then, we construct the "ground-truth" alignments of training sentence pairs by utilizing these probabilities. Furthermore, these alignments are integrated into NMT to capture sensible pinyin typo correction patterns. KNPTC is applied to correct typos in real-life datasets, which achieves 32.77% increment on average in accuracy rate of typo correction compared against the state-of-the-art system.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes