IRCLApr 15, 2021

COIL: Revisit Exact Lexical Match in Information Retrieval with Contextualized Inverted List

arXiv:2104.07186v1761 citations
AI Analysis

This work addresses the efficiency and effectiveness gap in neural information retrieval for applications requiring fast and accurate search, representing a hybrid rather than purely incremental advance.

The paper tackles the trade-off between semantic matching and computational efficiency in information retrieval by introducing COIL, a contextualized exact match architecture that uses overlapping token representations, achieving superior performance over classical and deep learning retrievers with comparable or lower latency.

Classical information retrieval systems such as BM25 rely on exact lexical match and carry out search efficiently with inverted list index. Recent neural IR models shifts towards soft semantic matching all query document terms, but they lose the computation efficiency of exact match systems. This paper presents COIL, a contextualized exact match retrieval architecture that brings semantic lexical matching. COIL scoring is based on overlapping query document tokens' contextualized representations. The new architecture stores contextualized token representations in inverted lists, bringing together the efficiency of exact match and the representation power of deep language models. Our experimental results show COIL outperforms classical lexical retrievers and state-of-the-art deep LM retrievers with similar or smaller latency.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes