LGMLMay 4, 2017

Semi-supervised model-based clustering with controlled clusters leakage

arXiv:1705.01877v1
Originality Incremental advance
AI Analysis

This method addresses the need for expert systems to integrate expert knowledge with data distribution for clustering, though it appears incremental as it builds on existing semi-supervised models.

The paper tackles the problem of clustering partially categorized data by proposing C3L, a semi-supervised Gaussian mixture model that controls inconsistency between initial categories and resulting clusters, with experimental results showing it finds high-quality clustering models.

In this paper, we focus on finding clusters in partially categorized data sets. We propose a semi-supervised version of Gaussian mixture model, called C3L, which retrieves natural subgroups of given categories. In contrast to other semi-supervised models, C3L is parametrized by user-defined leakage level, which controls maximal inconsistency between initial categorization and resulting clustering. Our method can be implemented as a module in practical expert systems to detect clusters, which combine expert knowledge with true distribution of data. Moreover, it can be used for improving the results of less flexible clustering techniques, such as projection pursuit clustering. The paper presents extensive theoretical analysis of the model and fast algorithm for its efficient optimization. Experimental results show that C3L finds high quality clustering model, which can be applied in discovering meaningful groups in partially classified data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes