LGGTMLFeb 7, 2020

A novel initialisation based on hospital-resident assignment for the k-modes algorithm

arXiv:2002.02701v11 citations
AI Analysis

This work addresses an incremental improvement for clustering categorical data, benefiting researchers and practitioners using k-modes algorithms.

The paper tackled the problem of selecting initial solutions for the k-modes algorithm by proposing a method based on the Hospital-Resident Assignment Problem to improve fairness and data leverage. The result showed that this method outperformed existing initializations in most cases, particularly when optimizing the number of clusters and for low-density data.

This paper presents a new way of selecting an initial solution for the k-modes algorithm that allows for a notion of mathematical fairness and a leverage of the data that the common initialisations from literature do not. The method, which utilises the Hospital-Resident Assignment Problem to find the set of initial cluster centroids, is compared with the current initialisations on both benchmark datasets and a body of newly generated artificial datasets. Based on this analysis, the proposed method is shown to outperform the other initialisations in the majority of cases, especially when the number of clusters is optimised. In addition, we find that our method outperforms the leading established method specifically for low-density data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes