LGDec 1, 2020

Improving cluster recovery with feature rescaling factors

arXiv:2012.00477v12 citations
AI Analysis

This work addresses the problem of improving cluster recovery for researchers and practitioners using clustering algorithms, offering an incremental improvement to data preprocessing.

This paper proposes a feature rescaling method for clustering that prioritizes features more meaningful for clustering, rather than treating all features identically. Their simulations on real and synthetic data show that clustering methods using their proposed normalization strategy outperform those using traditional methods.

The data preprocessing stage is crucial in clustering. Features may describe entities using different scales. To rectify this, one usually applies feature normalisation aiming at rescaling features so that none of them overpowers the others in the objective function of the selected clustering algorithm. In this paper, we argue that the rescaling procedure should not treat all features identically. Instead, it should favour the features that are more meaningful for clustering. With this in mind, we introduce a feature rescaling method that takes into account the within-cluster degree of relevance of each feature. Our comprehensive simulation study, carried out on real and synthetic data, with and without noise features, clearly demonstrates that clustering methods that use the proposed data normalization strategy clearly outperform those that use traditional data normalization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes