LG IRFeb 18

Neighborhood Stability as a Measure of Nearest Neighbor Searchability

arXiv:2602.16673v11.4h-index: 11

Originality Incremental advance

AI Analysis

This work addresses a gap in evaluating clustering-based ANNS for high-dimensional datasets, providing tools to determine searchability, which is incremental as it builds on existing ANNS methods without introducing a new search algorithm.

The authors tackled the lack of analytical tools to assess the suitability of clustering-based Approximate Nearest Neighbor Search (ANNS) for datasets, introducing two measures—Clustering-Neighborhood Stability Measure (clustering-NSM) and Point-Neighborhood Stability Measure (point-NSM)—that predict ANNS accuracy and clusterability based on nearest neighbor relationships, applicable to various distance functions.

Clustering-based Approximate Nearest Neighbor Search (ANNS) organizes a set of points into partitions, and searches only a few of them to find the nearest neighbors of a query. Despite its popularity, there are virtually no analytical tools to determine the suitability of clustering-based ANNS for a given dataset -- what we call "searchability." To address that gap, we present two measures for flat clusterings of high-dimensional points in Euclidean space. First is Clustering-Neighborhood Stability Measure (clustering-NSM), an internal measure of clustering quality -- a function of a clustering of a dataset -- that we show to be predictive of ANNS accuracy. The second, Point-Neighborhood Stability Measure (point-NSM), is a measure of clusterability -- a function of the dataset itself -- that is predictive of clustering-NSM. The two together allow us to determine whether a dataset is searchable by clustering-based ANNS given only the data points. Importantly, both are functions of nearest neighbor relationships between points, not distances, making them applicable to various distance functions including inner product.

View on arXiv PDF

Similar