LG MLNov 8, 2025

CADM: Cluster-customized Adaptive Distance Metric for Categorical Data Clustering

Taixi Chen, Yiu-ming Cheung, Yiqun Zhang

arXiv:2511.05826v1h-index: 6Has Code

Originality Incremental advance

AI Analysis

This addresses the challenge of unreasonable distance measurement in categorical data clustering, which is incremental as it builds on existing methods by adapting distances per cluster.

The authors tackled the problem of measuring distances between categorical data in clustering by proposing a cluster-customized adaptive distance metric that accounts for varying attribute distributions across clusters, achieving an average ranking of around first in fourteen datasets.

An appropriate distance metric is crucial for categorical data clustering, as the distance between categorical data cannot be directly calculated. However, the distances between attribute values usually vary in different clusters induced by their different distributions, which has not been taken into account, thus leading to unreasonable distance measurement. Therefore, we propose a cluster-customized distance metric for categorical data clustering, which can competitively update distances based on different distributions of attributes in each cluster. In addition, we extend the proposed distance metric to the mixed data that contains both numerical and categorical attributes. Experiments demonstrate the efficacy of the proposed method, i.e., achieving an average ranking of around first in fourteen datasets. The source code is available at https://anonymous.4open.science/r/CADM-47D8

View on arXiv PDF

Similar