DS IRMay 29, 2017

A scalable solution to the nearest neighbor search problem through local-search methods on neighbor graphs

Eric S. Tellez, Guillermo Ruiz, Edgar Chavez, Mario Graff

arXiv:1705.10351v43 citations

Originality Incremental advance

AI Analysis

This provides a scalable solution for data indexing in high-dimensional spaces, though it appears incremental as it builds on existing local-search methods.

The paper tackles the nearest neighbor search problem for high-dimensional data by introducing an algorithm that minimizes a kernel function using metaheuristics, achieving competitive performance in speed, accuracy, and memory across benchmarks.

Near neighbor search (NNS) is a powerful abstraction for data access; however, data indexing is troublesome even for approximate indexes. For intrinsically high-dimensional data, high-quality fast searches demand either indexes with impractically large memory usage or preprocessing time. In this paper, we introduce an algorithm to solve a nearest-neighbor query $q$ by minimizing a kernel function defined by the distance from $q$ to each object in the database. The minimization is performed using metaheuristics to solve the problem rapidly; even when some methods in the literature use this strategy behind the scenes, our approach is the first one using it explicitly. We also provide two approaches to select edges in the graph's construction stage that limit memory footprint and reduce the number of free parameters simultaneously. We carry out a thorough experimental comparison with state-of-the-art indexes through synthetic and real-world datasets; we found out that our contributions achieve competitive performances regarding speed, accuracy, and memory in almost any of our benchmarks.

View on arXiv PDF

Similar