IRJul 13, 2019Code
High Dimensional Similarity Search with Satellite System Graph: Efficiency, Scalability, and Unindexed Query CompatibilityCong Fu, Changxu Wang, Deng Cai
Approximate Nearest Neighbor Search (ANNS) in high dimensional space is essential in database and information retrieval. Recently, there has been a surge of interest in exploring efficient graph-based indices for the ANNS problem. Among them, Navigating Spreading-out Graph (NSG) provides fine theoretical analysis and achieves state-of-the-art performance. However, we find there are several limitations with NSG: 1) NSG has no theoretical guarantee on nearest neighbor search when the query is not indexed in the database; 2) NSG is too sparse which harms the search performance. In addition, NSG suffers from high indexing complexity. To address the above problems, we propose the Satellite System Graphs (SSG) and a practical variant NSSG. Specifically, we propose a novel pruning strategy to produce SSGs from the complete graph. SSGs define a new family of MSNETs in which the out-edges of each node are distributed evenly in all directions. Each node in the graph builds effective connections to its neighborhood omnidirectionally, whereupon we derive SSG's excellent theoretical properties for both indexed and unindexed queries. We can adaptively adjust the sparsity of an SSG with a hyper-parameter to optimize the search performance. Further, NSSG is proposed to reduce the indexing complexity of the SSG for large-scale applications. Both theoretical and extensive experimental analyses are provided to demonstrate the strengths of the proposed approach over the existing representative algorithms. Our code has been released at https://github.com/ZJULearning/SSG.
IRFeb 9, 2021
Large-Scale Visual Search with Binary Distributed Graph at AlibabaKang Zhao, Pan Pan, Yun Zheng et al.
Graph-based approximate nearest neighbor search has attracted more and more attentions due to its online search advantages. Numbers of methods studying the enhancement of speed and recall have been put forward. However, few of them focus on the efficiency and scale of offline graph-construction. For a deployed visual search system with several billions of online images in total, building a billion-scale offline graph in hours is essential, which is almost unachievable by most existing methods. In this paper, we propose a novel algorithm called Binary Distributed Graph to solve this problem. Specifically, we combine binary codes with graph structure to speedup online and offline procedures, and achieve comparable performance with the ones in real-value based scenarios by recalling more binary candidates. Furthermore, the graph-construction is optimized to completely distributed implementation, which significantly accelerates the offline process and gets rid of the limitation of memory and disk within a single machine. Experimental comparisons on Alibaba Commodity Data Set (more than three billion images) show that the proposed method outperforms the state-of-the-art with respect to the online/offline trade-off.
LGJul 1, 2017
Fast Approximate Nearest Neighbor Search With The Navigating Spreading-out GraphCong Fu, Chao Xiang, Changxu Wang et al.
Approximate nearest neighbor search (ANNS) is a fundamental problem in databases and data mining. A scalable ANNS algorithm should be both memory-efficient and fast. Some early graph-based approaches have shown attractive theoretical guarantees on search time complexity, but they all suffer from the problem of high indexing time complexity. Recently, some graph-based methods have been proposed to reduce indexing complexity by approximating the traditional graphs; these methods have achieved revolutionary performance on million-scale datasets. Yet, they still can not scale to billion-node databases. In this paper, to further improve the search-efficiency and scalability of graph-based methods, we start by introducing four aspects: (1) ensuring the connectivity of the graph; (2) lowering the average out-degree of the graph for fast traversal; (3) shortening the search path; and (4) reducing the index size. Then, we propose a novel graph structure called Monotonic Relative Neighborhood Graph (MRNG) which guarantees very low search complexity (close to logarithmic time). To further lower the indexing complexity and make it practical for billion-node ANNS problems, we propose a novel graph structure named Navigating Spreading-out Graph (NSG) by approximating the MRNG. The NSG takes the four aspects into account simultaneously. Extensive experiments show that NSG outperforms all the existing algorithms significantly. In addition, NSG shows superior performance in the E-commercial search scenario of Taobao (Alibaba Group) and has been integrated into their search engine at billion-node scale.