DS CC CG CR IRDec 8, 2017

Graph-based time-space trade-offs for approximate near neighbors

arXiv:1712.03158v18.625 citations

Originality Incremental advance

AI Analysis

This work provides a rigorous asymptotic analysis for graph-based nearest neighbor search, which is incremental but important for high-dimensional data processing in fields like machine learning and information retrieval.

The paper tackles the problem of finding approximate nearest neighbors in high-dimensional spaces by analyzing graph-based methods, showing that for random datasets on a sphere, they achieve query time and space complexities with specific trade-offs, matching optimal hash-based methods for small approximation factors and near-linear memory.

We take a first step towards a rigorous asymptotic analysis of graph-based approaches for finding (approximate) nearest neighbors in high-dimensional spaces, by analyzing the complexity of (randomized) greedy walks on the approximate near neighbor graph. For random data sets of size $n = 2^{o(d)}$ on the $d$-dimensional Euclidean unit sphere, using near neighbor graphs we can provably solve the approximate nearest neighbor problem with approximation factor $c > 1$ in query time $n^{ρ_q + o(1)}$ and space $n^{1 + ρ_s + o(1)}$, for arbitrary $ρ_q, ρ_s \geq 0$ satisfying \begin{align} (2c^2 - 1) ρ_q + 2 c^2 (c^2 - 1) \sqrt{ρ_s (1 - ρ_s)} \geq c^4. \end{align} Graph-based near neighbor searching is especially competitive with hash-based methods for small $c$ and near-linear memory, and in this regime the asymptotic scaling of a greedy graph-based search matches the recent optimal hash-based trade-offs of Andoni-Laarhoven-Razenshteyn-Waingarten [SODA'17]. We further study how the trade-offs scale when the data set is of size $n = 2^{Θ(d)}$, and analyze asymptotic complexities when applying these results to lattice sieving.

View on arXiv PDF

Similar