LG MLJun 30, 2019

Nearest-Neighbour-Induced Isolation Similarity and its Impact on Density-Based Clustering

Xiaoyu Qin, Kai Ming Ting, Ye Zhu, Vincent CS Lee

arXiv:1907.00378v19.146 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses clustering accuracy for data analysis, but it is incremental as it modifies an existing similarity measure and applies it to a classic algorithm.

The authors tackled the problem of improving density-based clustering by proposing a nearest-neighbor method to implement Isolation Similarity, which significantly uplifts DBSCAN's performance to surpass that of the recent DP algorithm.

A recent proposal of data dependent similarity called Isolation Kernel/Similarity has enabled SVM to produce better classification accuracy. We identify shortcomings of using a tree method to implement Isolation Similarity; and propose a nearest neighbour method instead. We formally prove the characteristic of Isolation Similarity with the use of the proposed method. The impact of Isolation Similarity on density-based clustering is studied here. We show for the first time that the clustering performance of the classic density-based clustering algorithm DBSCAN can be significantly uplifted to surpass that of the recent density-peak clustering algorithm DP. This is achieved by simply replacing the distance measure with the proposed nearest-neighbour-induced Isolation Similarity in DBSCAN, leaving the rest of the procedure unchanged. A new type of clusters called mass-connected clusters is formally defined. We show that DBSCAN, which detects density-connected clusters, becomes one which detects mass-connected clusters, when the distance measure is replaced with the proposed similarity. We also provide the condition under which mass-connected clusters can be detected, while density-connected clusters cannot.

View on arXiv PDF Code

Similar