Algorithms for certain classes of Tamil Spelling correction
This work addresses spelling correction challenges in Tamil, an agglutinative language, but is incremental as it summarizes known algorithms without introducing new methods.
The authors tackled the problem of Tamil spelling correction for conjoined words that are out-of-dictionary by proposing algorithmic techniques to efficiently handle them, such as decomposing words like [thendRalkattRu] into parts like [thendRal] and [kattRu] when these parts exist in a word list.
Tamil language has an agglutinative, diglossic, alpha-syllabary structure which provides a significant combinatorial explosion of morphological forms all of which are effectively used in Tamil prose, poetry from antiquity to the modern age in an unbroken chain of continuity. However, for the language understanding, spelling correction purposes some of these present challenges as out-of-dictionary words. In this paper the authors propose algorithmic techniques to handle specific problems of conjoined-words (out-of-dictionary) (transliteration)[thendRalkattRu] = [thendRal]+[kattRu] when parts are alone present in word-list in efficient way. Morphological structure of Tamil makes it necessary to depend on synthesis-analysis approach and dictionary lists will never be sufficient to truly capture the language. In this paper we have attempted to make a summary of various known algorithms for specific classes of Tamil spelling errors. We believe this collection of suggestions to improve future spelling checkers. We also note do not cover many important techniques like affix removal and other such techniques of key importance in rule-based spell checkers.