Bandit-Based Monte Carlo Optimization for Nearest Neighbors
This provides a more efficient solution for nearest neighbor search in high-dimensional data, which is incremental as it builds on existing Monte Carlo and bandit techniques.
The paper tackles the problem of high-dimensional k-nearest neighbors by developing a bandit-based Monte Carlo optimization algorithm that identifies exact nearest neighbors with high probability, achieving a complexity of O((n+d) log^2(nd/δ)) compared to O(nd) for exact computation and outperforming state-of-the-art methods like kGraph, NGT, and LSH in simulations.
The celebrated Monte Carlo method estimates an expensive-to-compute quantity by random sampling. Bandit-based Monte Carlo optimization is a general technique for computing the minimum of many such expensive-to-compute quantities by adaptive random sampling. The technique converts an optimization problem into a statistical estimation problem which is then solved via multi-armed bandits. We apply this technique to solve the problem of high-dimensional $k$-nearest neighbors, developing an algorithm which we prove is able to identify exact nearest neighbors with high probability. We show that under regularity assumptions on a dataset of $n$ points in $d$-dimensional space, the complexity of our algorithm scales logarithmically with the dimension of the data as $O\left((n+d)\log^2 \left(\frac{nd}δ\right)\right)$ for error probability $δ$, rather than linearly as in exact computation requiring $O(nd)$. We corroborate our theoretical results with numerical simulations, showing that our algorithm outperforms both exact computation and state-of-the-art algorithms such as kGraph, NGT, and LSH on real datasets.