CVSep 11, 2015

A reliable order-statistics-based approximate nearest neighbor search algorithm

arXiv:1509.03453v2
Originality Incremental advance
AI Analysis

This addresses the challenge of efficient nearest neighbor search for unstructured data, with potential applications in domains like information retrieval and machine learning, though it appears incremental as it builds on locality sensitive hashing concepts.

The paper tackles the problem of fast approximate nearest neighbor search by proposing an algorithm that classifies vectors based on the order and sign of their largest components, partitioning the space into cones. Experiments on simulated and real-world data show it achieves state-of-the-art performance.

We propose a new algorithm for fast approximate nearest neighbor search based on the properties of ordered vectors. Data vectors are classified based on the index and sign of their largest components, thereby partitioning the space in a number of cones centered in the origin. The query is itself classified, and the search starts from the selected cone and proceeds to neighboring ones. Overall, the proposed algorithm corresponds to locality sensitive hashing in the space of directions, with hashing based on the order of components. Thanks to the statistical features emerging through ordering, it deals very well with the challenging case of unstructured data, and is a valuable building block for more complex techniques dealing with structured data. Experiments on both simulated and real-world data prove the proposed algorithm to provide a state-of-the-art performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes