LGJan 29, 2025

KNN and K-means in Gini Prametric Spaces

arXiv:2501.18028v31 citationsECAI
Originality Incremental advance
AI Analysis

This work addresses robustness issues in clustering and classification for machine learning practitioners, but it is incremental as it builds on existing algorithms with a new metric.

This paper tackled the problem of noise and outliers in K-means and KNN algorithms by introducing Gini prametric spaces, which incorporate rank-based measures, resulting in superior performance and efficiency on 16 UCI datasets.

This paper introduces enhancements to the K-means and K-nearest neighbors (KNN) algorithms based on the concept of Gini prametric spaces, instead of traditional metric spaces. Unlike standard distance metrics, Gini prametrics incorporate both value-based and rank-based measures, offering robustness to noise and outliers. The main contributions include: (1) a Gini prametric that captures rank information alongside value distances; (2) a Gini K-means algorithm that is provably convergent and resilient to noisy data; and (3) a Gini KNN method that performs competitively with state-of-the-art approaches like Hassanat's distance in noisy environments. Experimental evaluations on 16 UCI datasets demonstrate the superior performance and efficiency of the Gini-based algorithms in clustering and classification tasks. This work opens new directions for rank-based prametrics in machine learning and statistical analysis.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes