IRMay 9, 2021

Joint Learning of Deep Retrieval Model and Product Quantization based Embedding Index

arXiv:2105.03933v334 citationsHas Code
Originality Highly original
AI Analysis

This addresses efficiency and accuracy bottlenecks in large-scale retrieval systems for applications like search engines and recommendation systems.

The paper tackles the problem of separate embedding learning and index building in deep retrieval systems, which causes additional indexing time and accuracy decay, by proposing Poeem - a method that jointly trains product quantization based embedding index with deep retrieval model. The results show significant retrieval accuracy improvements and reduction of indexing time to almost none.

Embedding index that enables fast approximate nearest neighbor(ANN) search, serves as an indispensable component for state-of-the-art deep retrieval systems. Traditional approaches, often separating the two steps of embedding learning and index building, incur additional indexing time and decayed retrieval accuracy. In this paper, we propose a novel method called Poeem, which stands for product quantization based embedding index jointly trained with deep retrieval model, to unify the two separate steps within an end-to-end training, by utilizing a few techniques including the gradient straight-through estimator, warm start strategy, optimal space decomposition and Givens rotation. Extensive experimental results show that the proposed method not only improves retrieval accuracy significantly but also reduces the indexing time to almost none. We have open sourced our approach for the sake of comparison and reproducibility.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes