MLNov 30, 2023
Choosing the parameter of the Fermat distance: navigating geometry and noiseFrédéric Chazal, Laure Ferraris, Pablo Groisman et al.
The Fermat distance has been recently established as a useful tool for machine learning tasks when a natural distance is not directly available to the practitioner or to improve the results given by Euclidean distances by exploding the geometrical and statistical properties of the dataset. This distance depends on a parameter $α$ that greatly impacts the performance of subsequent tasks. Ideally, the value of $α$ should be large enough to navigate the geometric intricacies inherent to the problem. At the same, it should remain restrained enough to sidestep any deleterious ramifications stemming from noise during the process of distance estimation. We study both theoretically and through simulations how to select this parameter.
42.0STMar 15
$K-$means with learned metricsPablo Groisman, Matthieu Jonckheere, Jordan Serres et al.
We study the Fréchet {\it k-}means of a metric measure space when both the measure and the distance are unknown and have to be estimated. We prove a general result that states that the {\it k-}means are continuous with respect to the measured Gromov-Hausdorff topology. In this situation, we also prove a stability result for the Voronoi clusters they determine. We do not assume uniqueness of the set of {\it k-}means, but when it is unique, the results are stronger. {This framework provides a unified approach to proving consistency for a wide range of metric learning procedures. As concrete applications, we obtain new consistency results for several important estimators that were previously unestablished, even when $k=1$. These include {\it k-}means based on: (i) Isomap and Fermat geodesic distances on manifolds, (ii) difussion distances, (iii) Wasserstein distances computed with respect to learned ground metrics. Finally, we consider applications beyond the statistical inference paradigm like (iv) first passage percolation and (v) discrete approximations of length spaces.}
MLDec 11, 2020
Intrinsic persistent homology via density-based metric learningXimena Fernández, Eugenio Borghini, Gabriel Mindlin et al.
We address the problem of estimating topological features from data in high dimensional Euclidean spaces under the manifold assumption. Our approach is based on the computation of persistent homology of the space of data points endowed with a sample metric known as Fermat distance. We prove that such metric space converges almost surely to the manifold itself endowed with an intrinsic metric that accounts for both the geometry of the manifold and the density that produces the sample. This fact implies the convergence of the associated persistence diagrams. The use of this intrinsic distance when computing persistent homology presents advantageous properties such as robustness to the presence of outliers in the input data and less sensitiveness to the particular embedding of the underlying manifold in the ambient space. We use these ideas to propose and implement a method for pattern recognition and anomaly detection in time series, which is evaluated in applications to real data.
NAJul 19, 2004
Adapting the time-step to recover the asymptotic behavior in a blow-up problemPablo Groisman
The equation $u_t = Δu + u^p$ with homegeneous Dirichlet boundary conditions has solutions with blow-up if $p > 1$. An adaptive time-step procedure is given to reproduce the asymptotic behvior of the solutions in the numerical approximations. We prove that the numerical method reproduces the blow-up cases, the blow-up rate and the blow-up time. We also localize the numerical blow-up set.