LG CV MLDec 24, 2014

An Effective Semi-supervised Divisive Clustering Algorithm

arXiv:1412.7625v24 citations

Originality Incremental advance

AI Analysis

This work addresses clustering problems in fields like bioinformatics and engineering, offering an incremental improvement with a novel semi-supervised approach.

The paper tackles the challenge of clustering massive and rapidly generated data by proposing a semi-supervised divisive clustering algorithm (SDC) that organizes data via a minimal spanning tree, transitions it to an in-tree structure, and divides it under supervision of labeled data, achieving fully automatic, non-iterative, parameter-free clustering that is insensitive to noise and handles high-dimensional, irregularly shaped data.

Nowadays, data are generated massively and rapidly from scientific fields as bioinformatics, neuroscience and astronomy to business and engineering fields. Cluster analysis, as one of the major data analysis tools, is therefore more significant than ever. We propose in this work an effective Semi-supervised Divisive Clustering algorithm (SDC). Data points are first organized by a minimal spanning tree. Next, this tree structure is transitioned to the in-tree structure, and then divided into sub-trees under the supervision of the labeled data, and in the end, all points in the sub-trees are directly associated with specific cluster centers. SDC is fully automatic, non-iterative, involving no free parameter, insensitive to noise, able to detect irregularly shaped cluster structures, applicable to the data sets of high dimensionality and different attributes. The power of SDC is demonstrated on several datasets.

View on arXiv PDF

Similar