Peng-Cheng Lin

5.5IRAug 2, 2019Code

Wan-Lei Zhao, Hui Wang, Peng-Cheng Lin et al.

k-nearest neighbor graph is a fundamental data structure in many disciplines such as information retrieval, data-mining, pattern recognition, and machine learning, etc. In the literature, considerable research has been focusing on how to efficiently build an approximate k-nearest neighbor graph (k-NN graph) for a fixed dataset. Unfortunately, a closely related issue of how to merge two existing k-NN graphs has been overlooked. In this paper, we address the issue of k-NN graph merging in two different scenarios. In the first scenario, a symmetric merge algorithm is proposed to combine two approximate k-NN graphs. The algorithm facilitates large-scale processing by the efficient merging of k-NN graphs that are produced in parallel. In the second scenario, a joint merge algorithm is proposed to expand an existing k-NN graph with a raw dataset. The algorithm enables the incremental construction of a hierarchical approximate k-NN graph. Superior performance is attained when leveraging the hierarchy for NN search of various data types, dimensionality, and distance measures.

8.4IRApr 3, 2019

Graph based Nearest Neighbor Search: Promises and Failures

Peng-Cheng Lin, Wan-Lei Zhao

Recently, graph based nearest neighbor search gets more and more popular on large-scale retrieval tasks. The attractiveness of this type of approaches lies in its superior performance over most of the known nearest neighbor search approaches as well as its genericness to various metrics. In this paper, the role of two strategies, namely hierarchical structure and graph diversification that are adopted as the key steps in the graph based approaches, is investigated. We find the hierarchical structure could not achieve "much better logarithmic complexity scaling" as it was claimed in the original paper, particularly on high dimensional cases. Moreover, we find that similar high search speed efficiency as the one with hierarchical structure could be achieved with the support of flat k-NN graph after graph diversification. Finally, we point out the difficulty, that is faced by most of the graph based search approaches, is directly linked to "curse of dimensionality".

Peng-Cheng Lin

2 Papers