LGMLDec 24, 2019

Variable feature weighted fuzzy k-means algorithm for high dimensional data

arXiv:1912.11209v25 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of handling irrelevant or variably significant features in clustering for real-world applications, representing an incremental improvement over existing fuzzy k-means methods.

The paper tackles the problem of clustering high-dimensional data by proposing a variable feature weighted fuzzy k-means algorithm that assigns cluster-dependent weights to features, improving clustering performance as demonstrated by higher AR, RI, and NMI scores compared to six state-of-the-art methods on multiple datasets.

This paper presents a new fuzzy k-means algorithm for the clustering of high-dimensional data in various subspaces. Since high-dimensional data, some features might be irrelevant and relevant but may have different significance in the clustering process. For better clustering, it is crucial to incorporate the contribution of these features in the clustering process. To combine these features, in this paper, we have proposed a novel fuzzy k-means clustering algorithm by modifying the objective function of the fuzzy k-means using two different entropy terms. The first entropy term helps to minimize the within-cluster dispersion and maximize the negative entropy to determine clusters to contribute to the association of data points. The second entropy term helps control the weight of the features because different features have different contributing weights during the clustering to obtain a better partition. The proposed approach performance is presented in various clustering measures (AR, RI and NMI) on multiple datasets and compared with six other state-of-the-art methods. Impact Statement- In real-world applications, cluster-dependent feature weights help in partitioning the data set into more meaningful clusters. These features may be relevant, irrelevant, or redundant, but they each have different contributions during the clustering process. In this paper, a cluster-dependent feature weights approach is presented using fuzzy k-means to assign higher weights to relevant features and lower weights to irrelevant features during clustering. The method is validated using both supervised and unsupervised performance measures on real-world and synthetic datasets to demonstrate its effectiveness compared to state-of-the-art methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes