LG MLNov 19, 2018

An efficient density-based clustering algorithm using reverse nearest neighbour

Stiphen Chowdhury, Renato Cordeiro de Amorim

arXiv:1811.07615v10.87 citations

Originality Incremental advance

AI Analysis

This is an incremental improvement for researchers and practitioners in data mining, addressing parameter tuning in clustering algorithms.

The authors tackled the problem of DBSCAN's parameter sensitivity by developing a new density-based clustering algorithm using reverse nearest neighbor (RNN) with a single parameter, which outperformed DBSCAN and ISDBSCAN on synthetic and real-world datasets.

Density-based clustering is the task of discovering high-density regions of entities (clusters) that are separated from each other by contiguous regions of low-density. DBSCAN is, arguably, the most popular density-based clustering algorithm. However, its cluster recovery capabilities depend on the combination of the two parameters. In this paper we present a new density-based clustering algorithm which uses reverse nearest neighbour (RNN) and has a single parameter. We also show that it is possible to estimate a good value for this parameter using a clustering validity index. The RNN queries enable our algorithm to estimate densities taking more than a single entity into account, and to recover clusters that are not well-separated or have different densities. Our experiments on synthetic and real-world data sets show our proposed algorithm outperforms DBSCAN and its recent variant ISDBSCAN.

View on arXiv PDF

Similar