A Normalized Bottleneck Distance on Persistence Diagrams and Homology Preservation under Dimension Reduction
This work addresses a scale-invariance issue in topological data analysis for researchers in computational topology and dimension reduction, though it is incremental as it builds on existing bottleneck distance frameworks.
The authors tackled the problem that bottleneck distance between persistence diagrams can be arbitrarily large for scaled point clouds, typical in dimension reduction, by introducing a normalized bottleneck distance and proving stability bounds. They showed that this new distance provides improved bounds for homology preservation under Johnson-Lindenstrauss projections, metric multidimensional scaling, and biLipschitz maps, and demonstrated its effectiveness in clustering point clouds from different shapes.
Persistence diagrams (PDs) are used as signatures of point cloud data. Two clouds of points can be compared using the bottleneck distance d_B between their PDs. A potential drawback of this pipeline is that point clouds sampled from topologically similar manifolds can have arbitrarily large d_B when there is a large scaling between them. This situation is typical in dimension reduction frameworks. We define, and study properties of, a new scale-invariant distance between PDs termed normalized bottleneck distance, d_N. In defining d_N, we develop a broader framework called metric decomposition for comparing finite metric spaces of equal cardinality with a bijection. We utilize metric decomposition to prove a stability result for d_N by deriving an explicit bound on the distortion of the bijective map. We then study two popular dimension reduction techniques, Johnson-Lindenstrauss (JL) projections and metric multidimensional scaling (mMDS), and a third class of general biLipschitz mappings. We provide new bounds on how well these dimension reduction techniques preserve homology with respect to d_N. For a JL map f that transforms input X to f(X), we show that d_N(dgm(X),dgm(f(X))) < e, where dgm(X) is the Vietoris-Rips PD of X, and pairwise distances are preserved by f up to the tolerance 0 < ε< 1. For mMDS, we present new bounds for d_B and d_N between PDs of X and its projection in terms of the eigenvalues of the covariance matrix. And for k-biLipschitz maps, we show that d_N is bounded by the product of (k^2-1)/k and the ratio of diameters of X and f(X). Finally, we use computational experiments to demonstrate the increased effectiveness of using the normalized bottleneck distance for clustering sets of point clouds sampled from different shapes.