The Opposite of Smoothing: A Language Model Approach to Ranking Query-Specific Document Clusters
This work addresses the challenge of enhancing retrieval precision for users in information retrieval systems, though it appears incremental as it builds on existing cluster ranking methods.
The paper tackles the problem of improving precision in document retrieval by ranking query-specific clusters based on the percentage of relevant documents they contain, using a novel language model that incorporates both cluster and document information. The model substantially outperforms previous cluster ranking approaches and yields better precision at top ranks compared to initial rankings and a state-of-the-art pseudo-feedback method.
Exploiting information induced from (query-specific) clustering of top-retrieved documents has long been proposed as a means for improving precision at the very top ranks of the returned results. We present a novel language model approach to ranking query-specific clusters by the presumed percentage of relevant documents that they contain. While most previous cluster ranking approaches focus on the cluster as a whole, our model utilizes also information induced from documents associated with the cluster. Our model substantially outperforms previous approaches for identifying clusters containing a high relevant-document percentage. Furthermore, using the model to produce document ranking yields precision-at-top-ranks performance that is consistently better than that of the initial ranking upon which clustering is performed. The performance also favorably compares with that of a state-of-the-art pseudo-feedback-based retrieval method.