LGOct 25, 2023

DECWA : Density-Based Clustering using Wasserstein Distance

arXiv:2310.16552v15 citationsh-index: 25
Originality Incremental advance
AI Analysis

This addresses limitations in clustering for data analysis, though it appears incremental as it builds on existing density-based methods.

The paper tackles the problem of density-based clustering struggling with low-density clusters, similar-density clusters, and high-dimensional data by proposing a new cluster characterization and algorithm using spatial density and Wasserstein distance, showing it outperforms state-of-the-art methods on various datasets.

Clustering is a data analysis method for extracting knowledge by discovering groups of data called clusters. Among these methods, state-of-the-art density-based clustering methods have proven to be effective for arbitrary-shaped clusters. Despite their encouraging results, they suffer to find low-density clusters, near clusters with similar densities, and high-dimensional data. Our proposals are a new characterization of clusters and a new clustering algorithm based on spatial density and probabilistic approach. First of all, sub-clusters are built using spatial density represented as probability density function ($p.d.f$) of pairwise distances between points. A method is then proposed to agglomerate similar sub-clusters by using both their density ($p.d.f$) and their spatial distance. The key idea we propose is to use the Wasserstein metric, a powerful tool to measure the distance between $p.d.f$ of sub-clusters. We show that our approach outperforms other state-of-the-art density-based clustering methods on a wide variety of datasets.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes