DSLGDec 19, 2025

Graph-based Nearest Neighbors with Dynamic Updates via Random Walks

arXiv:2512.18060v1h-index: 14
Originality Incremental advance
AI Analysis

This work addresses a specific bottleneck in retrieval systems, such as those used with large language models, by enabling efficient deletions without compromising performance, though it is incremental relative to existing graph-based ANN methods.

The paper tackles the problem of supporting deletions in graph-based approximate nearest neighbor search, which existing methods like HNSW lack or handle poorly, and proposes a new theoretical framework using random walks to develop a deterministic deletion algorithm that improves tradeoffs in query latency, recall, deletion time, and memory usage.

Approximate nearest neighbor search (ANN) is a common way to retrieve relevant search results, especially now in the context of large language models and retrieval augmented generation. One of the most widely used algorithms for ANN is based on constructing a multi-layer graph over the dataset, called the Hierarchical Navigable Small World (HNSW). While this algorithm supports insertion of new data, it does not support deletion of existing data. Moreover, deletion algorithms described by prior work come at the cost of increased query latency, decreased recall, or prolonged deletion time. In this paper, we propose a new theoretical framework for graph-based ANN based on random walks. We then utilize this framework to analyze a randomized deletion approach that preserves hitting time statistics compared to the graph before deleting the point. We then turn this theoretical framework into a deterministic deletion algorithm, and show that it provides better tradeoff between query latency, recall, deletion time, and memory usage through an extensive collection of experiments.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes