IRCLJan 5, 2022

Atomized Search Length: Beyond User Models

arXiv:2201.01745v1
AI Analysis

This work addresses a fundamental measurement problem in information retrieval that could impact the development of stronger IR systems, though it is incremental in proposing a new metric.

The paper argues that current IR metrics, which focus on user experience, inadequately measure deeper relevant documents, and introduces a new metric called 'atomized search length' to address this. By analyzing over 70 TREC tracks, they found that neural systems were near-optimal for top-ranked documents but showed only modest gains over BM25 for tail documents.

We argue that current IR metrics, modeled on optimizing user experience, measure too narrow a portion of the IR space. If IR systems are weak, these metrics undersample or completely filter out the deeper documents that need improvement. If IR systems are relatively strong, these metrics undersample deeper relevant documents that could underpin even stronger IR systems, ones that could present content from tens or hundreds of relevant documents in a user-digestible hierarchy or text summary. We reanalyze over 70 TREC tracks from the past 28 years, showing that roughly half undersample top ranked documents and nearly all undersample tail documents. We show that in the 2020 Deep Learning tracks, neural systems were actually near-optimal at top-ranked documents, compared to only modest gains over BM25 on tail documents. Our analysis is based on a simple new systems-oriented metric, 'atomized search length', which is capable of accurately and evenly measuring all relevant documents at any depth.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes