IRDec 14, 2022
Explainability of Text Processing and Retrieval Methods: A SurveySourav Saha, Debapriyo Majumdar, Mandar Mitra
Deep Learning and Machine Learning based models have become extremely popular in text processing and information retrieval. However, the non-linear structures present inside the networks make these models largely inscrutable. A significant body of research has focused on increasing the transparency of these models. This article provides a broad overview of research on the explainability and interpretability of natural language processing and information retrieval methods. More specifically, we survey approaches that have been applied to explain word embeddings, sequence modeling, attention modules, transformers, BERT, and document ranking. The concluding section suggests some possible directions for future research on this topic.
CLFeb 25
LiCQA : A Lightweight Complex Question Answering SystemSourav Saha, Dwaipayan Roy, Mandar Mitra
Over the last twenty years, significant progress has been made in designing and implementing Question Answering (QA) systems. However, addressing complex questions, the answers to which are spread across multiple documents, remains a challenging problem. Recent QA systems that are designed to handle complex questions work either on the basis of knowledge graphs, or utilise contem- porary neural models that are expensive to train, in terms of both computational resources and the volume of training data required. In this paper, we present LiCQA, an unsupervised question answer- ing model that works primarily on the basis of corpus evidence. We empirically compare the effectiveness and efficiency of LiCQA with two recently presented QA systems, which are based on different underlying principles. The results of our experiments show that LiCQA significantly outperforms these two state-of-the-art systems on benchmark data with noteworthy reduction in latency.
IRJan 13
Fine Grained Evaluation of LLMs-as-JudgesSourav Saha, Mandar Mitra
A good deal of recent research has focused on how Large Language Models (LLMs) may be used as `judges' in place of humans to evaluate the quality of the output produced by various text / image processing systems. Within this broader context, a number of studies have investigated the specific question of how effectively LLMs can be used as relevance assessors for the standard ad hoc task in Information Retrieval (IR). We extend these studies by looking at additional questions. Most importantly, we use a Wikipedia based test collection created by the INEX initiative, and prompt LLMs to not only judge whether documents are relevant / non-relevant, but to highlight relevant passages in documents that it regards as useful. The human relevance assessors involved in creating this collection were given analogous instructions, i.e., they were asked to highlight all passages within a document that respond to the information need expressed in a query. This enables us to evaluate the quality of LLMs as judges not only at the document level, but to also quantify how often these `judges' are right for the right reasons. Our findings suggest that LLMs-as-judges work best under human supervision.
IRFeb 15, 2022
Deep-QPP: A Pairwise Interaction-based Deep Learning Model for Supervised Query Performance PredictionSuchana Datta, Debasis Ganguly, Derek Greene et al.
Motivated by the recent success of end-to-end deep neural models for ranking tasks, we present here a supervised end-to-end neural approach for query performance prediction (QPP). In contrast to unsupervised approaches that rely on various statistics of document score distributions, our approach is entirely data-driven. Further, in contrast to weakly supervised approaches, our method also does not rely on the outputs from different QPP estimators. In particular, our model leverages information from the semantic interactions between the terms of a query and those in the top-documents retrieved with it. The architecture of the model comprises multiple layers of 2D convolution filters followed by a feed-forward layer of parameters. Experiments on standard test collections demonstrate that our proposed supervised approach outperforms other state-of-the-art supervised and unsupervised approaches.
IRFeb 13, 2022
An Analysis of Variations in the Effectiveness of Query Performance PredictionDebasis Ganguly, Suchana Datta, Mandar Mitra et al.
A query performance predictor estimates the retrieval effectiveness of an IR system for a given query. An important characteristic of QPP evaluation is that, since the ground truth retrieval effectiveness for QPP evaluation can be measured with different metrics, the ground truth itself is not absolute, which is in contrast to other retrieval tasks, such as that of ad-hoc retrieval. Motivated by this argument, the objective of this paper is to investigate how such variances in the ground truth for QPP evaluation can affect the outcomes of QPP experiments. We consider this not only in terms of the absolute values of the evaluation metrics being reported (e.g. Pearson's $r$, Kendall's $τ$), but also with respect to the changes in the ranks of different QPP systems when ordered by the QPP metric scores. Our experiments reveal that the observed QPP outcomes can vary considerably, both in terms of the absolute evaluation metric values and also in terms of the relative system ranks. Through our analysis, we report the optimal combinations of QPP evaluation metric and experimental settings that are likely to lead to smaller variations in the observed results.
IRApr 14, 2020
Tag Embedding Based Personalized Point Of Interest Recommendation SystemSuraj Agrawal, Dwaipayan Roy, Mandar Mitra
Personalized Point of Interest recommendation is very helpful for satisfying users' needs at new places. In this article, we propose a tag embedding based method for Personalized Recommendation of Point Of Interest. We model the relationship between tags corresponding to Point Of Interest. The model provides representative embedding corresponds to a tag in a way that related tags will be closer. We model Point of Interest-based on tag embedding and also model the users (user profile) based on the Point Of Interest rated by them. finally, we rank the user's candidate Point Of Interest based on cosine similarity between user's embedding and Point of Interest's embedding. Further, we find the parameters required to model user by discrete optimizing over different measures (like ndcg@5, MRR, ...). We also analyze the result while considering the same parameters for all users and individual parameters for each user. Along with it we also analyze the effect on the result while changing the dataset to model the relationship between tags. Our method also minimizes the privacy leak issue. We used TREC Contextual Suggestion 2016 Phase 2 dataset and have significant improvement over all the measures on the state of the art method. It improves ndcg@5 by 12.8%, p@5 by 4.3%, and MRR by 7.8%, which shows the effectiveness of the method.
IROct 25, 2017
Re-evaluating the need for Modelling Term-Dependence in Text Classification ProblemsSounak Banerjee, Prasenjit Majumder, Mandar Mitra
A substantial amount of research has been carried out in developing machine learning algorithms that account for term dependence in text classification. These algorithms offer acceptable performance in most cases but they are associated with a substantial cost. They require significantly greater resources to operate. This paper argues against the justification of the higher costs of these algorithms, based on their performance in text classification problems. In order to prove the conjecture, the performance of one of the best dependence models is compared to several well established algorithms in text classification. A very specific collection of datasets have been designed, which would best reflect the disparity in the nature of text data, that are present in real world applications. The results show that even one of the best term dependence models, performs decent at best when compared to other independence models. Coupled with their substantially greater requirement for hardware resources for operation, this makes them an impractical choice for being used in real world scenarios.
IRJun 25, 2016
Representing Documents and Queries as Sets of Word Embedded Vectors for Information RetrievalDwaipayan Roy, Debasis Ganguly, Mandar Mitra et al.
A major difficulty in applying word vector embeddings in IR is in devising an effective and efficient strategy for obtaining representations of compound units of text, such as whole documents, (in comparison to the atomic words), for the purpose of indexing and scoring documents. Instead of striving for a suitable method for obtaining a single vector representation of a large document of text, we rather aim for developing a similarity metric that makes use of the similarities between the individual embedded word vectors in a document and a query. More specifically, we represent a document and a query as sets of word vectors, and use a standard notion of similarity measure between these sets, computed as a function of the similarities between each constituent word pair from these sets. We then make use of this similarity measure in combination with standard IR based similarities for document ranking. The results of our initial experimental investigations shows that our proposed method improves MAP by up to $5.77\%$, in comparison to standard text-based language model similarity, on the TREC ad-hoc dataset.
IRJun 24, 2016
Using Word Embeddings for Automatic Query ExpansionDwaipayan Roy, Debjyoti Paul, Mandar Mitra et al.
In this paper a framework for Automatic Query Expansion (AQE) is proposed using distributed neural language model word2vec. Using semantic and contextual relation in a distributed and unsupervised framework, word2vec learns a low dimensional embedding for each vocabulary entry. Using such a framework, we devise a query expansion technique, where related terms to a query are obtained by K-nearest neighbor approach. We explore the performance of the AQE methods, with and without feedback query expansion, and a variant of simple K-nearest neighbor in the proposed framework. Experiments on standard TREC ad-hoc data (Disk 4, 5 with query sets 301-450, 601-700) and web data (WT10G data with query set 451-550) shows significant improvement over standard term-overlapping based retrieval methods. However the proposed method fails to achieve comparable performance with statistical co-occurrence based feedback method such as RM3. We have also found that the word2vec based query expansion methods perform similarly with and without any feedback information.
IRSep 18, 2015
Exploring Query Categorisation for Query Expansion: A StudyDipasree Pal, Mandar Mitra, Samar Bhattacharya
The vocabulary mismatch problem is one of the important challenges facing traditional keyword-based Information Retrieval Systems. The aim of query expansion (QE) is to reduce this query-document mismatch by adding related or synonymous words or phrases to the query. Several existing query expansion algorithms have proved their merit, but they are not uniformly beneficial for all kinds of queries. Our long-term goal is to formulate methods for applying QE techniques tailored to individual queries, rather than applying the same general QE method to all queries. As an initial step, we have proposed a taxonomy of query classes (from a QE perspective) in this report. We have discussed the properties of each query class with examples. We have also discussed some QE strategies that might be effective for each query category. In future work, we intend to test the proposed techniques using standard datasets, and to explore automatic query categorisation methods.
IRSep 19, 2013
Improving Query Expansion Using WordNetDipasree Pal, Mandar Mitra, Kalyankumar Datta
This study proposes a new way of using WordNet for Query Expansion (QE). We choose candidate expansion terms, as usual, from a set of pseudo relevant documents; however, the usefulness of these terms is measured based on their definitions provided in a hand-crafted lexical resource like WordNet. Experiments with a number of standard TREC collections show that this method outperforms existing WordNet based methods. It also compares favorably with established QE methods such as KLD and RM3. Leveraging earlier work in which a combination of QE methods was found to outperform each individual method (as well as other well-known QE methods), we next propose a combination-based QE method that takes into account three different aspects of a candidate expansion term's usefulness: (i) its distribution in the pseudo relevant documents and in the target corpus, (ii) its statistical association with query terms, and (iii) its semantic relation with the query, as determined by the overlap between the WordNet definitions of the term and query terms. This combination of diverse sources of information appears to work well on a number of test collections, viz., TREC123, TREC5, TREC678, TREC robust new and TREC910 collections, and yields significant improvements over competing methods on most of these collections.
IRMar 4, 2013
Query Expansion Using Term Distribution and Term AssociationDipasree Pal, Mandar Mitra, Kalyankumar Datta
Good term selection is an important issue for an automatic query expansion (AQE) technique. AQE techniques that select expansion terms from the target corpus usually do so in one of two ways. Distribution based term selection compares the distribution of a term in the (pseudo) relevant documents with that in the whole corpus / random distribution. Two well-known distribution-based methods are based on Kullback-Leibler Divergence (KLD) and Bose-Einstein statistics (Bo1). Association based term selection, on the other hand, uses information about how a candidate term co-occurs with the original query terms. Local Context Analysis (LCA) and Relevance-based Language Model (RM3) are examples of association-based methods. Our goal in this study is to investigate how these two classes of methods may be combined to improve retrieval effectiveness. We propose the following combination-based approach. Candidate expansion terms are first obtained using a distribution based method. This set is then refined based on the strength of the association of terms with the original query terms. We test our methods on 11 TREC collections. The proposed combinations generally yield better results than each individual method, as well as other state-of-the-art AQE approaches. En route to our primary goal, we also propose some modifications to LCA and Bo1 which lead to improved performance.