LGAug 6, 2021
Rectified Euler k-means and BeyondYunxia Lin, Songcan chen
Euler k-means (EulerK) first maps data onto the unit hyper-sphere surface of equi-dimensional space via a complex mapping which induces the robust Euler kernel and next employs the popular $k$-means. Consequently, besides enjoying the virtues of k-means such as simplicity and scalability to large data sets, EulerK is also robust to noises and outliers. Although so, the centroids captured by EulerK deviate from the unit hyper-sphere surface and thus in strict distributional sense, actually are outliers. This weird phenomenon also occurs in some generic kernel clustering methods. Intuitively, using such outlier-like centroids should not be quite reasonable but it is still seldom attended. To eliminate the deviation, we propose two Rectified Euler k-means methods, i.e., REK1 and REK2, which retain the merits of EulerK while acquire real centroids residing on the mapped space to better characterize the data structures. Specifically, REK1 rectifies EulerK by imposing the constraint on the centroids while REK2 views each centroid as the mapped image from a pre-image in the original space and optimizes these pre-images in Euler kernel induced space. Undoubtedly, our proposed REKs can methodologically be extended to solve problems of such a category. Finally, the experiments validate the effectiveness of REK1 and REK2.
LGSep 20, 2020
Convex Subspace Clustering by Adaptive Block Diagonal RepresentationYunxia Lin, Songcan Chen
Subspace clustering is a class of extensively studied clustering methods where the spectral-type approaches are its important subclass. Its key first step is to desire learning a representation coefficient matrix with block diagonal structure. To realize this step, many methods were successively proposed by imposing different structure priors on the coefficient matrix. These impositions can be roughly divided into two categories, i.e., indirect and direct. The former introduces the priors such as sparsity and low rankness to indirectly or implicitly learn the block diagonal structure. However, the desired block diagonalty cannot necessarily be guaranteed for noisy data. While the latter directly or explicitly imposes the block diagonal structure prior such as block diagonal representation (BDR) to ensure so-desired block diagonalty even if the data is noisy but at the expense of losing the convexity that the former's objective possesses. For compensating their respective shortcomings, in this paper, we follow the direct line to propose Adaptive Block Diagonal Representation (ABDR) which explicitly pursues block diagonalty without sacrificing the convexity of the indirect one. Specifically, inspired by Convex BiClustering, ABDR coercively fuses both columns and rows of the coefficient matrix via a specially designed convex regularizer, thus naturally enjoying their merits and adaptively obtaining the number of blocks. Finally, experimental results on synthetic and real benchmarks demonstrate the superiority of ABDR to the state-of-the-arts (SOTAs).
LGApr 27, 2020
A Centroid Auto-Fused Hierarchical Fuzzy c-Means ClusteringYunxia Lin, Songcan Chen
Like k-means and Gaussian Mixture Model (GMM), fuzzy c-means (FCM) with soft partition has also become a popular clustering algorithm and still is extensively studied. However, these algorithms and their variants still suffer from some difficulties such as determination of the optimal number of clusters which is a key factor for clustering quality. A common approach for overcoming this difficulty is to use the trial-and-validation strategy, i.e., traversing every integer from large number like $\sqrt{n}$ to 2 until finding the optimal number corresponding to the peak value of some cluster validity index. But it is scarcely possible to naturally construct an adaptively agglomerative hierarchical cluster structure as using the trial-and-validation strategy. Even possible, existing different validity indices also lead to different number of clusters. To effectively mitigate the problems while motivated by convex clustering, in this paper we present a Centroid Auto-Fused Hierarchical Fuzzy c-means method (CAF-HFCM) whose optimization procedure can automatically agglomerate to form a cluster hierarchy, more importantly, yielding an optimal number of clusters without resorting to any validity index. Although a recently-proposed robust-learning fuzzy c-means (RL-FCM) can also automatically obtain the best number of clusters without the help of any validity index, so-involved 3 hyper-parameters need to adjust expensively, conversely, our CAF-HFCM involves just 1 hyper-parameter which makes the corresponding adjustment is relatively easier and more operational. Further, as an additional benefit from our optimization objective, the CAF-HFCM effectively reduces the sensitivity to the initialization of clustering performance. Moreover, our proposed CAF-HFCM method is able to be straightforwardly extended to various variants of FCM.