OCLGNEMar 7, 2024

Memetic Differential Evolution Methods for Semi-Supervised Clustering

arXiv:2403.04322v21 citationsh-index: 6J Classif
Originality Incremental advance
AI Analysis

This work addresses semi-supervised clustering for data analysis applications, but it is incremental as it adapts an existing framework to a constrained version of the problem.

The authors tackled the semi-supervised Minimum Sum-of-Squares Clustering problem by extending a memetic differential evolution framework to handle must-link and cannot-link constraints, resulting in a new method called S-MDEClust that shows effectiveness and efficiency in computational experiments.

In this paper, we propose an extension for semi-supervised Minimum Sum-of-Squares Clustering (MSSC) problems of MDEClust, a memetic framework based on the Differential Evolution paradigm for unsupervised clustering. In semi-supervised MSSC, background knowledge is available in the form of (instance-level) "must-link" and "cannot-link" constraints, each of which indicating if two dataset points should be associated to the same or to a different cluster, respectively. The presence of such constraints makes the problem at least as hard as its unsupervised version and, as a consequence, some framework operations need to be carefully designed to handle this additional complexity: for instance, it is no more true that each point is associated to its nearest cluster center. As far as we know, our new framework, called S-MDEClust, represents the first memetic methodology designed to generate a (hopefully) optimal feasible solution for semi-supervised MSSC problems. Results of thorough computational experiments on a set of well-known as well as synthetic datasets show the effectiveness and efficiency of our proposal.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes