In-Chan Choi

IRJul 30, 2015

Generalized Ensemble Model for Document Ranking in Information Retrieval

Yanshan Wang, In-Chan Choi, Hongfang Liu

A generalized ensemble model (gEnM) for document ranking is proposed in this paper. The gEnM linearly combines basis document retrieval models and tries to retrieve relevant documents at high positions. In order to obtain the optimal linear combination of multiple document retrieval models or rankers, an optimization program is formulated by directly maximizing the mean average precision. Both supervised and unsupervised learning algorithms are presented to solve this program. For the supervised scheme, two approaches are considered based on the data setting, namely batch and online setting. In the batch setting, we propose a revised Newton's algorithm, gEnM.BAT, by approximating the derivative and Hessian matrix. In the online setting, we advocate a stochastic gradient descent (SGD) based algorithm---gEnM.ON. As for the unsupervised scheme, an unsupervised ensemble model (UnsEnM) by iteratively co-learning from each constituent ranker is presented. Experimental study on benchmark data sets verifies the effectiveness of the proposed algorithms. Therefore, with appropriate algorithms, the gEnM is a viable option in diverse practical information retrieval applications.

IRSep 13, 2013

Indexing by Latent Dirichlet Allocation and Ensemble Model

Yanshan Wang, Jae-Sung Lee, In-Chan Choi

The contribution of this paper is two-fold. First, we present Indexing by Latent Dirichlet Allocation (LDI), an automatic document indexing method. The probability distributions in LDI utilize those in Latent Dirichlet Allocation (LDA), a generative topic model that has been previously used in applications for document retrieval tasks. However, the ad hoc applications, or their variants with smoothing techniques as prompted by previous studies in LDA-based language modeling, result in unsatisfactory performance as the document representations do not accurately reflect concept space. To improve performance, we introduce a new definition of document probability vectors in the context of LDA and present a novel scheme for automatic document indexing based on LDA. Second, we propose an Ensemble Model (EnM) for document retrieval. The EnM combines basis indexing models by assigning different weights and attempts to uncover the optimal weights to maximize the Mean Average Precision (MAP). To solve the optimization problem, we propose an algorithm, EnM.B, which is derived based on the boosting method. The results of our computational experiments on benchmark data sets indicate that both the proposed approaches are viable options for document retrieval.

In-Chan Choi

2 Papers