LGMar 6, 2024

CDC: A Simple Framework for Complex Data Clustering

Zhao Kang, Xuanting Xie, Bingheng Li, Erlin Pan

arXiv:2403.03670v217.064 citationsh-index: 7IEEE Trans Neural Netw Learn Syst

Originality Incremental advance

AI Analysis

This addresses the need for a unified clustering method for complex data in data-driven applications, though it appears incremental as it builds on existing graph filtering and anchor-based techniques.

The paper tackles the problem of clustering complex data types like multi-view and non-Euclidean data by proposing a simple framework (CDC) that efficiently processes different data types with linear complexity, achieving scalability on graph data up to 111M nodes.

In today's data-driven digital era, the amount as well as complexity, such as multi-view, non-Euclidean, and multi-relational, of the collected data are growing exponentially or even faster. Clustering, which unsupervisely extracts valid knowledge from data, is extremely useful in practice. However, existing methods are independently developed to handle one particular challenge at the expense of the others. In this work, we propose a simple but effective framework for complex data clustering (CDC) that can efficiently process different types of data with linear complexity. We first utilize graph filtering to fuse geometry structure and attribute information. We then reduce the complexity with high-quality anchors that are adaptively learned via a novel similarity-preserving regularizer. We illustrate the cluster-ability of our proposed method theoretically and experimentally. In particular, we deploy CDC to graph data of size 111M.

View on arXiv PDF

Similar