Real-Word Error Correction with Trigrams: Correcting Multiple Errors in a Sentence
This work addresses spelling correction for text mining applications, but it is incremental as it builds on prior models.
The paper tackled the problem of correcting multiple real-word errors in sentences by proposing a new variation using a Probabilistic Context-Free Grammar, and it showed that this approach outperformed existing methods on the Wall Street Journal corpus.
Spelling correction is a fundamental task in Text Mining. In this study, we assess the real-word error correction model proposed by Mays, Damerau and Mercer and describe several drawbacks of the model. We propose a new variation which focuses on detecting and correcting multiple real-word errors in a sentence, by manipulating a Probabilistic Context-Free Grammar (PCFG) to discriminate between items in the search space. We test our approach on the Wall Street Journal corpus and show that it outperforms Hirst and Budanitsky's WordNet-based method and Wilcox-O'Hearn, Hirst, and Budanitsky's fixed windows size method.-O'Hearn, Hirst, and Budanitsky's fixed windows size method.