LG IT MLJan 31, 2019

Generalized Dirichlet-process-means for $f$-separable distortion measures

arXiv:1901.11331v33.46 citations

Originality Incremental advance

AI Analysis

This work addresses robustness issues in clustering algorithms for data analysis, but it is incremental as it builds upon existing DP-means methods.

The authors tackled the problem of DP-means clustering being limited to average distortion measures, which makes it vulnerable to outliers and can lead to high maximum distortion in clusters, by extending it to f-separable distortion measures and proposing a unified learning algorithm; they demonstrated improved performance through numerical experiments on real datasets, though no concrete numbers were provided.

DP-means clustering was obtained as an extension of $K$-means clustering. While it is implemented with a simple and efficient algorithm, it can estimate the number of clusters simultaneously. However, DP-means is specifically designed for the average distortion measure. Therefore, it is vulnerable to outliers in data, and can cause large maximum distortion in clusters. In this work, we extend the objective function of the DP-means to $f$-separable distortion measures and propose a unified learning algorithm to overcome the above problems by selecting the function $f$. Further, the influence function of the estimated cluster center is analyzed to evaluate the robustness against outliers. We demonstrate the performance of the generalized method by numerical experiments using real datasets.

View on arXiv PDF

Similar