IRFeb 4, 2016

Improved Query Topic Models via Pseudo-Relevant Pólya Document Models

arXiv:1602.01665v12.74 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses vocabulary mismatch for information retrieval systems, representing an incremental improvement over existing query expansion techniques.

The paper tackled the problem of vocabulary mismatch in information retrieval by developing a new method to estimate query topic models from pseudo-relevant documents using a multivariate Polya mixture framework, resulting in favorable comparisons to state-of-the-art expansion methods on TREC collections.

Query-expansion via pseudo-relevance feedback is a popular method of overcoming the problem of vocabulary mismatch and of increasing average retrieval effectiveness. In this paper, we develop a new method that estimates a query topic model from a set of pseudo-relevant documents using a new language modelling framework. We assume that documents are generated via a mixture of multivariate Polya distributions, and we show that by identifying the topical terms in each document, we can appropriately select terms that are likely to belong to the query topic model. The results of experiments on several TREC collections show that the new approach compares favourably to current state-of-the-art expansion methods.

View on arXiv PDF Code

Similar