CVSep 20, 2015

On Large-Scale Retrieval: Binary or n-ary Coding?

Mahyar Najibi, Mohammad Rastegari, Larry S. Davis

arXiv:1509.06066v11.3

Originality Incremental advance

AI Analysis

This work addresses the problem of efficient information retrieval in large datasets for researchers and practitioners, presenting an incremental improvement by comparing and optimizing coding methods for specific retrieval techniques.

The paper investigates whether binary or n-ary coding performs better for large-scale retrieval under different strategies, finding that n-ary LSQ excels in Distance Estimation while binary LSQ is superior for Subset Indexing in image retrieval.

The growing amount of data available in modern-day datasets makes the need to efficiently search and retrieve information. To make large-scale search feasible, Distance Estimation and Subset Indexing are the main approaches. Although binary coding has been popular for implementing both techniques, n-ary coding (known as Product Quantization) is also very effective for Distance Estimation. However, their relative performance has not been studied for Subset Indexing. We investigate whether binary or n-ary coding works better under different retrieval strategies. This leads to the design of a new n-ary coding method, "Linear Subspace Quantization (LSQ)" which, unlike other n-ary encoders, can be used as a similarity-preserving embedding. Experiments on image retrieval show that when Distance Estimation is used, n-ary LSQ outperforms other methods. However, when Subset Indexing is applied, interestingly, binary codings are more effective and binary LSQ achieves the best accuracy.

View on arXiv PDF

Similar