IRCLJun 28, 2021

A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques

arXiv:2106.14807v1185 citationsHas Code
Originality Synthesis-oriented
AI Analysis

This work provides a framework for understanding retrieval methods and offers an incremental improvement for researchers in information retrieval.

The paper organizes recent information retrieval techniques into a conceptual framework based on sparse vs. dense and unsupervised vs. learned representations, and introduces uniCOIL, which achieves state-of-the-art sparse retrieval on the MS MARCO dataset.

Recent developments in representational learning for information retrieval can be organized in a conceptual framework that establishes two pairs of contrasts: sparse vs. dense representations and unsupervised vs. learned representations. Sparse learned representations can further be decomposed into expansion and term weighting components. This framework allows us to understand the relationship between recently proposed techniques such as DPR, ANCE, DeepCT, DeepImpact, and COIL, and furthermore, gaps revealed by our analysis point to "low hanging fruit" in terms of techniques that have yet to be explored. We present a novel technique dubbed "uniCOIL", a simple extension of COIL that achieves to our knowledge the current state-of-the-art in sparse retrieval on the popular MS MARCO passage ranking dataset. Our implementation using the Anserini IR toolkit is built on the Lucene search library and thus fully compatible with standard inverted indexes.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes