LGMLJun 27, 2012

A Split-Merge Framework for Comparing Clusterings

arXiv:1206.6475v210 citations
AI Analysis

This work addresses the need for better normalized and more informative clustering comparison methods for researchers and practitioners in data analysis, though it appears incremental as it builds on existing component-based formulas.

The authors tackled the problem of clustering evaluation measures lacking proper normalization and ignoring structural information by proposing a split-merge framework based on a bipartite graph model, which they demonstrated empirically using an entropy-based instance and a coreference resolution dataset to show improved utility over other measures.

Clustering evaluation measures are frequently used to evaluate the performance of algorithms. However, most measures are not properly normalized and ignore some information in the inherent structure of clusterings. We model the relation between two clusterings as a bipartite graph and propose a general component-based decomposition formula based on the components of the graph. Most existing measures are examples of this formula. In order to satisfy consistency in the component, we further propose a split-merge framework for comparing clusterings of different data sets. Our framework gives measures that are conditionally normalized, and it can make use of data point information, such as feature vectors and pairwise distances. We use an entropy-based instance of the framework and a coreference resolution data set to demonstrate empirically the utility of our framework over other measures.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes