Hiteshwar Kumar Azad

IR
3papers
435citations
Novelty32%
AI Score21

3 Papers

IRAug 27, 2019
A novel model for query expansion using pseudo-relevant web knowledge

Hiteshwar Kumar Azad, Akshay Deepak

In the field of information retrieval, query expansion (QE) has long been used as a technique to deal with the fundamental issue of word mismatch between a user's query and the target information. In the context of the relationship between the query and expanded terms, existing weighting techniques often fail to appropriately capture the term-term relationship and term to the whole query relationship, resulting in low retrieval effectiveness. Our proposed QE approach addresses this by proposing three weighting models based on (1) tf-itf, (2) k-nearest neighbor (kNN) based cosine similarity, and (3) correlation score. Further, to extract the initial set of expanded terms, we use pseudo-relevant web knowledge consisting of the top N web pages returned by the three popular search engines namely, Google, Bing, and DuckDuckGo, in response to the original query. Among the three weighting models, tf-itf scores each of the individual terms obtained from the web content, kNN-based cosine similarity scores the expansion terms to obtain the term-term relationship, and correlation score weighs the selected expansion terms with respect to the whole query. The proposed model, called web knowledge based query expansion (WKQE), achieves an improvement of 25.89% on the MAP score and 30.83% on the GMAP score over the unexpanded queries on the FIRE dataset. A comparative analysis of the WKQE techniques with other related approaches clearly shows significant improvement in the retrieval performance. We have also analyzed the effect of varying the number of pseudo-relevant documents and expansion terms on the retrieval effectiveness of the proposed model.

IRJan 29, 2019
A new approach for query expansion using Wikipedia and WordNet

Hiteshwar Kumar Azad, Akshay Deepak

Query expansion (QE) is a well-known technique used to enhance the effectiveness of information retrieval. QE reformulates the initial query by adding similar terms that help in retrieving more relevant results. Several approaches have been proposed in literature producing quite favorable results, but they are not evenly favorable for all types of queries (individual and phrase queries). One of the main reasons for this is the use of the same kind of data sources and weighting scheme while expanding both the individual and the phrase query terms. As a result, the holistic relationship among the query terms is not well captured or scored. To address this issue, we have presented a new approach for QE using Wikipedia and WordNet as data sources. Specifically, Wikipedia gives rich expansion terms for phrase terms, while WordNet does the same for individual terms. We have also proposed novel weighting schemes for expansion terms: in-link score (for terms extracted from Wikipedia) and a tf-idf based scheme (for terms extracted from WordNet). In the proposed Wikipedia-WordNet-based QE technique (WWQE), we weigh the expansion terms twice: first, they are scored by the weighting scheme individually, and then, the weighting scheme scores the selected expansion terms concerning the entire query using correlation score. The proposed approach gains improvements of 24% on the MAP score and 48% on the GMAP score over unexpanded queries on the FIRE dataset. Experimental results achieve a significant improvement over individual expansion and other related state-of-the-art approaches. We also analyzed the effect on retrieval effectiveness of the proposed technique by varying the number of expansion terms.

IRAug 1, 2017
Query expansion techniques for information retrieval: A survey

Hiteshwar Kumar Azad, Akshay Deepak

With the ever increasing size of the web, relevant information extraction on the Internet with a query formed by a few keywords has become a big challenge. Query Expansion (QE) plays a crucial role in improving searches on the Internet. Here, the user's initial query is reformulated by adding additional meaningful terms with similar significance. QE -- as part of information retrieval (IR) -- has long attracted researchers' attention. It has become very influential in the field of personalized social document, question answering, cross-language IR, information filtering and multimedia IR. Research in QE has gained further prominence because of IR dedicated conferences such as TREC (Text Information Retrieval Conference) and CLEF (Conference and Labs of the Evaluation Forum). This paper surveys QE techniques in IR from 1960 to 2017 with respect to core techniques, data sources used, weighting and ranking methodologies, user participation and applications -- bringing out similarities and differences.