CL IRApr 16, 2021

Matching-oriented Product Quantization For Ad-hoc Retrieval

Shitao Xiao, Zheng Liu, Yingxia Shao, Defu Lian, Xing Xie

arXiv:2104.07858v32.010 citationsHas Code

Originality Highly original

AI Analysis

This work addresses the challenge of improving retrieval accuracy in ad-hoc search systems, representing an incremental advancement in supervised quantization methods.

The paper tackles the problem of limited improvement in supervised product quantization for ad-hoc retrieval by proposing Matching-oriented Product Quantization (MoPQ) with a novel Multinoulli Contrastive Loss (MCL) objective, achieving state-of-the-art results on four real-world datasets.

Product quantization (PQ) is a widely used technique for ad-hoc retrieval. Recent studies propose supervised PQ, where the embedding and quantization models can be jointly trained with supervised learning. However, there is a lack of appropriate formulation of the joint training objective; thus, the improvements over previous non-supervised baselines are limited in reality. In this work, we propose the Matching-oriented Product Quantization (MoPQ), where a novel objective Multinoulli Contrastive Loss (MCL) is formulated. With the minimization of MCL, we are able to maximize the matching probability of query and ground-truth key, which contributes to the optimal retrieval accuracy. Given that the exact computation of MCL is intractable due to the demand of vast contrastive samples, we further propose the Differentiable Cross-device Sampling (DCS), which significantly augments the contrastive samples for precise approximation of MCL. We conduct extensive experimental studies on four real-world datasets, whose results verify the effectiveness of MoPQ. The code is available at https://github.com/microsoft/MoPQ.

View on arXiv PDF Code

Similar