IRApr 18, 2019

The Simplest Thing That Can Possibly Work: Pseudo-Relevance Feedback Using Text Classification

arXiv:1904.08861v112 citations
Originality Incremental advance
AI Analysis

This addresses the problem of improving retrieval performance in information retrieval, though it is incremental as it combines existing elements.

The paper tackles pseudo-relevance feedback by using a simple document relevance classifier trained on pseudo-labels from an initial ranked list, resulting in significant improvements across multiple newswire collections.

Motivated by recent commentary that has questioned today's pursuit of ever-more complex models and mathematical formalisms in applied machine learning and whether meaningful empirical progress is actually being made, this paper tries to tackle the decades-old problem of pseudo-relevance feedback with "the simplest thing that can possibly work". I present a technique based on training a document relevance classifier for each information need using pseudo-labels from an initial ranked list and then applying the classifier to rerank the retrieved documents. Experiments demonstrate significant improvements across a number of newswire collections, with initial rankings supplied by "bag of words" BM25 as well as from a well-tuned query expansion model. While this simple technique draws elements from several well-known threads in the literature, to my knowledge this exact combination has not previously been proposed and evaluated.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes