3 Papers

IRFeb 5, 2020
Experiments with Different Indexing Techniques for Text Retrieval tasks on Gujarati Language using Bag of Words Approach

Jyoti Pareek, Hardik Joshi, Krunal Chauhan et al.

This paper presents results of various experiments carried out to improve text retrieval of gujarati text documents. Text retrieval involves searching and ranking of text documents for a given set of query terms. We have tested various retrieval models that uses bag-of-words approach. Bag-of-words approach is a traditional approach that is being used till date where the text document is represented as collection of words. Measures like frequency count, inverse document frequency etc. are used to signify and rank relevant documents for user queries. Different ranking models have been used to quantify ranking performance using the metric of mean average precision. Gujarati is a morphologically rich language, we have compared techniques like stop word removal, stemming and frequent case generation against baseline to measure the improvements in information retrieval tasks. Most of the techniques are language dependent and requires development of language specific tools. We used plain unprocessed word index as the baseline, we have seen significant improvements in comparison of MAP values after applying different indexing techniques when compared to the baseline.

IRJan 18, 2020
Experiments on Manual Thesaurus based Query Expansion for Ad-hoc Monolingual Gujarati Information Retrieval Tasks

Hardik Joshi, Jyoti Pareek

In this paper, we present the experimental work done on Query Expansion (QE) for retrieval tasks of Gujarati text documents. In information retrieval, it is very difficult to estimate the exact user need, query expansion adds terms to the original query, which provides more information about the user need. There are various approaches to query expansion. In our work, manual thesaurus based query expansion was performed to evaluate the performance of widely used information retrieval models for Gujarati text documents. Results show that query expansion improves the recall of text documents.

IRJun 26, 2014
From Citation count to Argumentation count: a new metric to indicate the usefulness of an article

Hardik Joshi

Citation count is a quantifiable measure to indicate the number of times an article is cited by other articles. It is believed that if an article is cited often then it must be an important or influential article; however, there is no guarantee that the most cited articles are good in quality. In this paper, the author suggests argumentation count, a new metric for citation analysis. The proposed metric, argumentation count is a triplet of quantities for each concept of an article that helps in providing a quantifiable measure about the usefulness of an article.