IR CLApr 23, 2024

A Reproducibility Study of PLAID

arXiv:2404.14989v112.015 citationsh-index: 26SIGIR

Originality Synthesis-oriented

AI Analysis

This is an incremental study that highlights the importance of baseline selection for evaluating retrieval engine efficiency in information retrieval.

This study reproduces and extends the PLAID algorithm for ColBERTv2, finding that re-ranking BM25 results with ColBERTv2 offers better efficiency-effectiveness trade-offs in low-latency settings, but modifications to include neighbor documents are needed for peak effectiveness across all operational points.

The PLAID (Performance-optimized Late Interaction Driver) algorithm for ColBERTv2 uses clustered term representations to retrieve and progressively prune documents for final (exact) document scoring. In this paper, we reproduce and fill in missing gaps from the original work. By studying the parameters PLAID introduces, we find that its Pareto frontier is formed of a careful balance among its three parameters; deviations beyond the suggested settings can substantially increase latency without necessarily improving its effectiveness. We then compare PLAID with an important baseline missing from the paper: re-ranking a lexical system. We find that applying ColBERTv2 as a re-ranker atop an initial pool of BM25 results provides better efficiency-effectiveness trade-offs in low-latency settings. However, re-ranking cannot reach peak effectiveness at higher latency settings due to limitations in recall of lexical matching and provides a poor approximation of an exhaustive ColBERTv2 search. We find that recently proposed modifications to re-ranking that pull in the neighbors of top-scoring documents overcome this limitation, providing a Pareto frontier across all operational points for ColBERTv2 when evaluated using a well-annotated dataset. Curious about why re-ranking methods are highly competitive with PLAID, we analyze the token representation clusters PLAID uses for retrieval and find that most clusters are predominantly aligned with a single token and vice versa. Given the competitive trade-offs that re-ranking baselines exhibit, this work highlights the importance of carefully selecting pertinent baselines when evaluating the efficiency of retrieval engines.

View on arXiv PDF

Similar