AraSpell: A Deep Learning Approach for Arabic Spelling Correction
This addresses spelling correction for Arabic language users, but it is incremental as it applies existing seq2seq methods to a specific domain.
The paper tackles Arabic spelling correction by introducing AraSpell, a deep learning framework using seq2seq models like RNN and Transformer, trained on over 6.9 million sentences with artificial error injection, achieving word error rates as low as 4.8% and character error rates as low as 1.11% on test data.
Spelling correction is the task of identifying spelling mistakes, typos, and grammatical mistakes in a given text and correcting them according to their context and grammatical structure. This work introduces "AraSpell," a framework for Arabic spelling correction using different seq2seq model architectures such as Recurrent Neural Network (RNN) and Transformer with artificial data generation for error injection, trained on more than 6.9 Million Arabic sentences. Thorough experimental studies provide empirical evidence of the effectiveness of the proposed approach, which achieved 4.8% and 1.11% word error rate (WER) and character error rate (CER), respectively, in comparison with labeled data of 29.72% WER and 5.03% CER. Our approach achieved 2.9% CER and 10.65% WER in comparison with labeled data of 10.02% CER and 50.94% WER. Both of these results are obtained on a test set of 100K sentences.