CVDBDSIRFeb 28, 2017

Billion-scale similarity search with GPUs

arXiv:1702.08734v15266 citationsHas Code
Originality Highly original
AI Analysis

This work addresses the challenge of scaling similarity search for large datasets like images and videos, which is crucial for specialized database systems, by significantly improving GPU efficiency.

The paper tackled the problem of efficiently utilizing GPUs for billion-scale similarity search by proposing a high-performance k-selection design, achieving up to 55% of theoretical peak performance and speeding up nearest neighbor implementations by 8.5x over prior GPU state of the art, enabling tasks like constructing a k-NN graph on 95 million images in 35 minutes.

Similarity search finds application in specialized database systems handling complex data such as images or videos, which are typically represented by high-dimensional features and require specific indexing structures. This paper tackles the problem of better utilizing GPUs for this task. While GPUs excel at data-parallel tasks, prior approaches are bottlenecked by algorithms that expose less parallelism, such as k-min selection, or make poor use of the memory hierarchy. We propose a design for k-selection that operates at up to 55% of theoretical peak performance, enabling a nearest neighbor implementation that is 8.5x faster than prior GPU state of the art. We apply it in different similarity search scenarios, by proposing optimized design for brute-force, approximate and compressed-domain search based on product quantization. In all these setups, we outperform the state of the art by large margins. Our implementation enables the construction of a high accuracy k-NN graph on 95 million images from the Yfcc100M dataset in 35 minutes, and of a graph connecting 1 billion vectors in less than 12 hours on 4 Maxwell Titan X GPUs. We have open-sourced our approach for the sake of comparison and reproducibility.

Code Implementations14 repos

Data from Papers with Code (CC-BY-SA-4.0)

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes