A Binary Schema and Computational Algorithms to Process Vowel-based Euphonic Conjunctions for Word Searches
This work addresses a domain-specific challenge in Sanskrit text processing, offering incremental improvements for scholars and researchers in computational linguistics.
The paper tackles the problem of searching for Sanskrit words in electronic texts, where words change forms due to euphonic conjunctions (sandhi), by developing a binary schema and algorithms to process these conjunctions and generate transformed word forms for comprehensive searches.
Comprehensively searching for words in Sanskrit E-text is a non-trivial problem because words could change their forms in different contexts. One such context is sandhi or euphonic conjunctions, which cause a word to change owing to the presence of adjacent letters or words. The change wrought by these possible conjunctions can be so significant in Sanskrit that a simple search for the word in its given form alone can significantly reduce the success level of the search. This work presents a representational schema that represents letters in a binary format and reduces Paninian rules of euphonic conjunctions to simple bit set-unset operations. The work presents an efficient algorithm to process vowel-based sandhis using this schema. It further presents another algorithm that uses the sandhi processor to generate the possible transformed word forms of a given word to use in a comprehensive word search.