AIFeb 12, 2022

Towards Continuous Consistency Axiom

arXiv:2202.06015v15 citations
AI Analysis

This work provides a theoretical foundation for generating labeled datasets to test clustering algorithms, addressing overfitting issues in machine learning research, though it is incremental as it modifies existing axioms rather than introducing a new paradigm.

The paper addresses the inapplicability of Kleinberg's clustering axioms in finite-dimensional Euclidean spaces, particularly for algorithms like k-means, by proposing an alternative axiomatic system with centric and motion consistency axioms. It demonstrates that this system is satisfiable for a hierarchical k-means variant and can be extended to detect concave clusters, enabling the generation of labeled test data for clustering algorithm evaluation.

Development of new algorithms in the area of machine learning, especially clustering, comparative studies of such algorithms as well as testing according to software engineering principles requires availability of labeled data sets. While standard benchmarks are made available, a broader range of such data sets is necessary in order to avoid the problem of overfitting. In this context, theoretical works on axiomatization of clustering algorithms, especially axioms on clustering preserving transformations are quite a cheap way to produce labeled data sets from existing ones. However, the frequently cited axiomatic system of Kleinberg:2002, as we show in this paper, is not applicable for finite dimensional Euclidean spaces, in which many algorithms like $k$-means, operate. In particular, the so-called outer-consistency axiom fails upon making small changes in datapoint positions and inner-consistency axiom is valid only for identity transformation in general settings. Hence we propose an alternative axiomatic system, in which Kleinberg's inner consistency axiom is replaced by a centric consistency axiom and outer consistency axiom is replaced by motion consistency axiom. We demonstrate that the new system is satisfiable for a hierarchical version of $k$-means with auto-adjusted $k$, hence it is not contradictory. Additionally, as $k$-means creates convex clusters only, we demonstrate that it is possible to create a version detecting concave clusters and still the axiomatic system can be satisfied. The practical application area of such an axiomatic system may be the generation of new labeled test data from existent ones for clustering algorithm testing. %We propose the gravitational consistency as a replacement which does not have this deficiency.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes