IRJul 12, 2021

SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking

Thibault Formal, Benjamin Piwowarski, Stéphane Clinchant

arXiv:2107.05720v139.8617 citations

Originality Incremental advance

AI Analysis

This work addresses the need for efficient and effective retrieval in ranking pipelines, offering a simple, end-to-end trained solution that balances accuracy and speed, though it appears incremental as it builds on existing sparse representation methods.

The paper tackles the problem of first-stage ranking in neural information retrieval by introducing SPLADE, a model that uses explicit sparsity regularization and log-saturation on term weights to produce highly sparse representations, achieving competitive results with state-of-the-art dense and sparse methods.

In neural Information Retrieval, ongoing research is directed towards improving the first retriever in ranking pipelines. Learning dense embeddings to conduct retrieval using efficient approximate nearest neighbors methods has proven to work well. Meanwhile, there has been a growing interest in learning sparse representations for documents and queries, that could inherit from the desirable properties of bag-of-words models such as the exact matching of terms and the efficiency of inverted indexes. In this work, we present a new first-stage ranker based on explicit sparsity regularization and a log-saturation effect on term weights, leading to highly sparse representations and competitive results with respect to state-of-the-art dense and sparse methods. Our approach is simple, trained end-to-end in a single stage. We also explore the trade-off between effectiveness and efficiency, by controlling the contribution of the sparsity regularization.

View on arXiv PDF

Similar