IT CV IRJan 26, 2017

Sparse Ternary Codes for similarity search have higher coding gain than dense binary codes

Sohrab Ferdowsi, Slava Voloshynovskiy, Dimche Kostadinov, Taras Holotyak

arXiv:1701.07675v27.311 citations

Originality Incremental advance

AI Analysis

This addresses the need for efficient similarity search in large-scale databases, offering an incremental improvement over existing methods.

The paper tackles the problem of Approximate Nearest Neighbor search by proposing sparse ternary codes, which achieve higher coding gains and lower complexity compared to traditional dense binary codes.

This paper addresses the problem of Approximate Nearest Neighbor (ANN) search in pattern recognition where feature vectors in a database are encoded as compact codes in order to speed-up the similarity search in large-scale databases. Considering the ANN problem from an information-theoretic perspective, we interpret it as an encoding, which maps the original feature vectors to a less entropic sparse representation while requiring them to be as informative as possible. We then define the coding gain for ANN search using information-theoretic measures. We next show that the classical approach to this problem, which consists of binarization of the projected vectors is sub-optimal. Instead, a properly designed ternary encoding achieves higher coding gains and lower complexity.

View on arXiv PDF

Similar