LGFeb 3, 2023

Uniform tensor clustering by jointly exploring sample affinities of various orders

Hongmin Cai, Fei Qi, Junyu Li, Yu Hu, Yue Zhang, Yiu-ming Cheung, Bin Hu

arXiv:2302.01569v12.02 citationsh-index: 23

Originality Incremental advance

AI Analysis

This is an incremental improvement for clustering high-dimensional data with low sample sizes.

The authors tackled the problem of clustering high-dimensional data with low sample sizes by proposing a unified tensor clustering method that uses multiple orders of sample affinity to improve accuracy, demonstrating enhanced clustering performance on synthetic and real-world datasets.

Conventional clustering methods based on pairwise affinity usually suffer from the concentration effect while processing huge dimensional features yet low sample sizes data, resulting in inaccuracy to encode the sample proximity and suboptimal performance in clustering. To address this issue, we propose a unified tensor clustering method (UTC) that characterizes sample proximity using multiple samples' affinity, thereby supplementing rich spatial sample distributions to boost clustering. Specifically, we find that the triadic tensor affinity can be constructed via the Khari-Rao product of two affinity matrices. Furthermore, our early work shows that the fourth-order tensor affinity is defined by the Kronecker product. Therefore, we utilize arithmetical products, Khatri-Rao and Kronecker products, to mathematically integrate different orders of affinity into a unified tensor clustering framework. Thus, the UTC jointly learns a joint low-dimensional embedding to combine various orders. Finally, a numerical scheme is designed to solve the problem. Experiments on synthetic datasets and real-world datasets demonstrate that 1) the usage of high-order tensor affinity could provide a supplementary characterization of sample proximity to the popular affinity matrix; 2) the proposed method of UTC is affirmed to enhance clustering by exploiting different order affinities when processing high-dimensional data.

View on arXiv PDF

Similar