An Efficient Indexing and Searching Technique for Information Retrieval for Urdu Language
This work addresses information retrieval challenges for Urdu language users, but it is incremental as it adapts existing indexing methods to a specific language.
The paper tackles the problem of information retrieval for the Urdu language by proposing an indexing technique that includes a stemmer based on morphological rules, and suggests creating indexes without stop words and as order index files, with results compared across different implementations.
Indexing techniques are used to improve retrieval of data in response to certain search condition. Inverted files are mostly used for creating indexes. This paper proposes indexing technique for Urdu language. Language processing step in Index creation is different for a particular language. We discuss index creation steps specifically for Urdu language. We explore morphological rules for Urdu language and implement these rules to create Urdu stemmer. We implement our proposed technique with different implementations and compare results. We suggest that indexes should be created without stop words and also index file should be an order index file.