The Simplest Thing That Can Possibly Work: Pseudo-Relevance Feedback Using Text Classification
This addresses the problem of improving retrieval performance in information retrieval, though it is incremental as it combines existing elements.
The paper tackles pseudo-relevance feedback by using a simple document relevance classifier trained on pseudo-labels from an initial ranked list, resulting in significant improvements across multiple newswire collections.
Motivated by recent commentary that has questioned today's pursuit of ever-more complex models and mathematical formalisms in applied machine learning and whether meaningful empirical progress is actually being made, this paper tries to tackle the decades-old problem of pseudo-relevance feedback with "the simplest thing that can possibly work". I present a technique based on training a document relevance classifier for each information need using pseudo-labels from an initial ranked list and then applying the classifier to rerank the retrieved documents. Experiments demonstrate significant improvements across a number of newswire collections, with initial rankings supplied by "bag of words" BM25 as well as from a well-tuned query expansion model. While this simple technique draws elements from several well-known threads in the literature, to my knowledge this exact combination has not previously been proposed and evaluated.