DS LGOct 21, 2023

Fast Approximation of Similarity Graphs with Kernel Density Estimation

arXiv:2310.13870v13.35 citationsh-index: 4

Originality Incremental advance

AI Analysis

This work addresses a bottleneck in clustering algorithms for data scientists by providing a faster and more scalable approach to similarity graph construction.

The paper tackles the high computational cost of constructing similarity graphs for clustering by introducing a new algorithmic framework that builds a sparse approximation while preserving cluster structure. The method significantly outperforms existing implementations from scikit-learn and FAISS on various datasets.

Constructing a similarity graph from a set $X$ of data points in $\mathbb{R}^d$ is the first step of many modern clustering algorithms. However, typical constructions of a similarity graph have high time complexity, and a quadratic space dependency with respect to $|X|$. We address this limitation and present a new algorithmic framework that constructs a sparse approximation of the fully connected similarity graph while preserving its cluster structure. Our presented algorithm is based on the kernel density estimation problem, and is applicable for arbitrary kernel functions. We compare our designed algorithm with the well-known implementations from the scikit-learn library and the FAISS library, and find that our method significantly outperforms the implementation from both libraries on a variety of datasets.

View on arXiv PDF

Similar