$K-$means with learned metrics

Pablo Groisman, Matthieu Jonckheere, Jordan Serres, Mariela Sued

arXiv:2603.1460140.7h-index: 14

AI Analysis

This work addresses a foundational problem in machine learning by offering a unified framework for proving consistency in metric learning procedures, which is incremental but extends to various applications like manifold learning and statistical inference.

The paper tackles the problem of k-means clustering when both the distance metric and the underlying measure are unknown and must be estimated, proving continuity and stability results in this setting. It provides new consistency results for several estimators, including those based on Isomap, Fermat geodesic, diffusion, and Wasserstein distances, which were previously unestablished.

We study the FrÃ©chet {\it k-}means of a metric measure space when both the measure and the distance are unknown and have to be estimated. We prove a general result that states that the {\it k-}means are continuous with respect to the measured Gromov-Hausdorff topology. In this situation, we also prove a stability result for the Voronoi clusters they determine. We do not assume uniqueness of the set of {\it k-}means, but when it is unique, the results are stronger. {This framework provides a unified approach to proving consistency for a wide range of metric learning procedures. As concrete applications, we obtain new consistency results for several important estimators that were previously unestablished, even when $k=1$. These include {\it k-}means based on: (i) Isomap and Fermat geodesic distances on manifolds, (ii) difussion distances, (iii) Wasserstein distances computed with respect to learned ground metrics. Finally, we consider applications beyond the statistical inference paradigm like (iv) first passage percolation and (v) discrete approximations of length spaces.}

View on arXiv PDF

Similar