Anomaly Detection and Improvement of Clusters using Enhanced K-Means Algorithm
This work addresses cluster quality and anomaly identification in datasets, but it is incremental as it builds upon the standard k-means algorithm.
The paper tackles the problem of cluster refinement and anomaly detection by proposing a novel algorithm that reduces intra-cluster variance, achieving variance reductions of up to 88.1% and accuracy improvements of 22.5% on datasets like UCI Wine Quality.
This paper introduces a unified approach to cluster refinement and anomaly detection in datasets. We propose a novel algorithm that iteratively reduces the intra-cluster variance of N clusters until a global minimum is reached, yielding tighter clusters than the standard k-means algorithm. We evaluate the method using intrinsic measures for unsupervised learning, including the silhouette coefficient, Calinski-Harabasz index, and Davies-Bouldin index, and extend it to anomaly detection by identifying points whose assignment causes a significant variance increase. External validation on synthetic data and the UCI Breast Cancer and UCI Wine Quality datasets employs the Jaccard similarity score, V-measure, and F1 score. Results show variance reductions of 18.7% and 88.1% on the synthetic and Wine Quality datasets, respectively, along with accuracy and F1 score improvements of 22.5% and 20.8% on the Wine Quality dataset.