Empirical complexity of comparator-based nearest neighbor descent
This work addresses the efficiency and scalability of nearest neighbor search for data analysis, but it is incremental as it focuses on implementation and empirical validation of an existing algorithm.
The paper presents a Java parallel streams implementation of the K-nearest neighbor descent algorithm with a statistical termination criterion, achieving an overall complexity of O(n K^2 log_K(n)) in experiments and showing high accuracy up to 20 dimensions but declining in higher dimensions.
A Java parallel streams implementation of the $K$-nearest neighbor descent algorithm is presented using a natural statistical termination criterion. Input data consist of a set $S$ of $n$ objects of type V, and a Function<V, Comparator<V>>, which enables any $x \in S$ to decide which of $y, z \in S\setminus\{x\}$ is more similar to $x$. Experiments with the Kullback-Leibler divergence Comparator support the prediction that the number of rounds of $K$-nearest neighbor updates need not exceed twice the diameter of the undirected version of a random regular out-degree $K$ digraph on $n$ vertices. Overall complexity was $O(n K^2 \log_K(n))$ in the class of examples studied. When objects are sampled uniformly from a $d$-dimensional simplex, accuracy of the $K$-nearest neighbor approximation is high up to $d = 20$, but declines in higher dimensions, as theory would predict.