LGSep 9, 2021

On the use of Wasserstein metric in topological clustering of distributional data

Guénaël Cabanes, Younès Bennani, Rosanna Verde, Antonio Irpino

arXiv:2109.04301v13.13 citations

Originality Synthesis-oriented

AI Analysis

This is an incremental improvement for researchers in data analysis, focusing on clustering distributional data with automated parameter selection.

The paper tackles clustering of histogram data by combining a Self-Organizing Map for dimension reduction with an automatic cluster number determination based on local density, using the L2 Wasserstein distance as a dissimilarity measure. It demonstrates the approach on synthetic and real datasets, but does not provide concrete numerical results.

This paper deals with a clustering algorithm for histogram data based on a Self-Organizing Map (SOM) learning. It combines a dimension reduction by SOM and the clustering of the data in a reduced space. Related to the kind of data, a suitable dissimilarity measure between distributions is introduced: the $L_2$ Wasserstein distance. Moreover, the number of clusters is not fixed in advance but it is automatically found according to a local data density estimation in the original space. Applications on synthetic and real data sets corroborate the proposed strategy.

View on arXiv PDF

Similar