Shortest path distance in random k-nearest neighbor graphs
This addresses a theoretical issue in graph-based machine learning methods, revealing limitations in using unweighted kNN graphs for distance estimation.
The paper investigates the convergence of shortest path distances in random k-nearest neighbor graphs as sample size increases, proving that in unweighted graphs, it converges to a problematic distance function harmful for machine learning, and also examines weighted graph behavior.
Consider a weighted or unweighted k-nearest neighbor graph that has been built on n data points drawn randomly according to some density p on R^d. We study the convergence of the shortest path distance in such graphs as the sample size tends to infinity. We prove that for unweighted kNN graphs, this distance converges to an unpleasant distance function on the underlying space whose properties are detrimental to machine learning. We also study the behavior of the shortest path distance in weighted kNN graphs.