AIJun 12, 2015

Leading Tree in DPCLUS and Its Impact on Building Hierarchies

arXiv:1506.03879v23 citations
Originality Incremental advance
AI Analysis

This work offers an incremental improvement for researchers and practitioners in data mining by speeding up cluster assignment in hierarchical clustering methods.

The paper identifies a tree structure within the DPCLUS clustering algorithm and uses it to accelerate hierarchical clustering by converting nearest neighbor indices into a Leading Tree, which reduces assignment time and provides a more informative cluster representation.

This paper reveals the tree structure as an intermediate result of clustering by fast search and find of density peaks (DPCLUS), and explores the power of using this tree to perform hierarchical clustering. The array used to hold the index of the nearest higher-densitied object for each object can be transformed into a Leading Tree (LT), in which each parent node P leads its child nodes to join the same cluster as P itself, and the child nodes are sorted by their gamma values in descendant order to accelerate the disconnecting of root in each subtree. There are two major advantages with the LT: One is dramatically reducing the running time of assigning noncenter data points to their cluster ID, because the assigning process is turned into just disconnecting the links from each center to its parent. The other is that the tree model for representing clusters is more informative. Because we can check which objects are more likely to be selected as centers in finer grained clustering, or which objects reach to its center via less jumps. Experiment results and analysis show the effectiveness and efficiency of the assigning process with an LT.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes