LGDSMLOct 27, 2025

Sublinear Sketches for Approximate Nearest Neighbor and Kernel Density Estimation

arXiv:2510.23039v1h-index: 9
Originality Highly original
AI Analysis

This addresses the challenge of efficient data analysis in massive, dynamic streams for applications in machine learning and information systems, offering novel theoretical guarantees.

The paper tackles the problem of designing compact sketches for Approximate Nearest Neighbor (ANN) search and Approximate Kernel Density Estimation (A-KDE) in dynamic data streams, achieving sublinear space and query time guarantees, with experiments showing low error on real-world datasets.

Approximate Nearest Neighbor (ANN) search and Approximate Kernel Density Estimation (A-KDE) are fundamental problems at the core of modern machine learning, with broad applications in data analysis, information systems, and large-scale decision making. In massive and dynamic data streams, a central challenge is to design compact sketches that preserve essential structural properties of the data while enabling efficient queries. In this work, we develop new sketching algorithms that achieve sublinear space and query time guarantees for both ANN and A-KDE for a dynamic stream of data. For ANN in the streaming model, under natural assumptions, we design a sublinear sketch that requires only $\mathcal{O}(n^{1+ρ-η})$ memory by storing only a sublinear ($n^{-η}$) fraction of the total inputs, where $ρ$ is a parameter of the LSH family, and $0<η<1$. Our method supports sublinear query time, batch queries, and extends to the more general Turnstile model. While earlier works have focused on Exact NN, this is the first result on ANN that achieves near-optimal trade-offs between memory size and approximation error. Next, for A-KDE in the Sliding-Window model, we propose a sketch of size $\mathcal{O}\left(RW \cdot \frac{1}{\sqrt{1+ε} - 1} \log^2 N\right)$, where $R$ is the number of sketch rows, $W$ is the LSH range, $N$ is the window size, and $ε$ is the approximation error. This, to the best of our knowledge, is the first theoretical sublinear sketch guarantee for A-KDE in the Sliding-Window model. We complement our theoretical results with experiments on various real-world datasets, which show that the proposed sketches are lightweight and achieve consistently low error in practice.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes