CLLGMLOct 23, 2019

A context sensitive real-time Spell Checker with language adaptability

arXiv:1910.11242v128 citationsHas Code
Originality Incremental advance
AI Analysis

This addresses the lack of publicly available, real-time spell checkers that can be easily extended to non-English languages, though it is incremental in its approach.

The authors tackled the problem of creating a real-time, context-sensitive spell checker that can be adapted to multiple languages, achieving performance comparable to industry tools across 11 languages and demonstrating scalability to 24 languages with minimal language-specific processing.

We present a novel language adaptable spell checking system which detects spelling errors and suggests context sensitive corrections in real-time. We show that our system can be extended to new languages with minimal language-specific processing. Available literature majorly discusses spell checkers for English but there are no publicly available systems which can be extended to work for other languages out of the box. Most of the systems do not work in real-time. We explain the process of generating a language's word dictionary and n-gram probability dictionaries using Wikipedia-articles data and manually curated video subtitles. We present the results of generating a list of suggestions for a misspelled word. We also propose three approaches to create noisy channel datasets of real-world typographic errors. We compare our system with industry-accepted spell checker tools for 11 languages. Finally, we show the performance of our system on synthetic datasets for 24 languages.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes