Sarah Percival

LGSep 12, 2023

G-Mapper: Learning a Cover in the Mapper Construction

Enrique Alvarado, Robin Belton, Emily Fischer et al.

The Mapper algorithm is a visualization technique in topological data analysis (TDA) that outputs a graph reflecting the structure of a given dataset. However, the Mapper algorithm requires tuning several parameters in order to generate a ``nice" Mapper graph. This paper focuses on selecting the cover parameter. We present an algorithm that optimizes the cover of a Mapper graph by splitting a cover repeatedly according to a statistical test for normality. Our algorithm is based on G-means clustering which searches for the optimal number of clusters in $k$-means by iteratively applying the Anderson-Darling test. Our splitting procedure employs a Gaussian mixture model to carefully choose the cover according to the distribution of the given data. Experiments for synthetic and real-world datasets demonstrate that our algorithm generates covers so that the Mapper graphs retain the essence of the datasets, while also running significantly faster than a previous iterative method.

4.7CGApr 16

Towards an Optimal Bound for the Interleaving Distance on Mapper Graphs

Erin Wolf Chambers, Ishika Ghosh, Elizabeth Munch et al.

Mapper graphs are widely used tools in topological data analysis and visualization. They can be understood as discrete approximations of Reeb graphs, providing insight into the shape and connectivity of complex data. Given a high-dimensional point cloud together with a real-valued function defined on it, a mapper graph summarizes the induced topological structure: each node represents a local neighborhood, and edges connect nodes whose corresponding neighborhoods overlap. Our focus is the interleaving distance for mapper graphs, arising as a discretized analogue of the interleaving distance for Reeb graphs-a quantity known to be NP-hard to compute. This distance measures how similar two mapper graphs are by quantifying how much they must be ``stretched'' to be made comparable. Recent work introduced a loss function that gives an upper bound on this distance. The loss evaluates how far a given collection of maps, called an assignment, is from being a true interleaving. Importantly, it is computationally tractable, offering a practical way to bound the distance, however the quality of the bound is dependent on the choice of assignment. In this paper, we develop the first framework for bounding the interleaving distance on mapper graphs. We present the bound in two ways: first, by formulating an integer linear program (ILP) that determines whether an $n$-interleaving exists for a given $n$; and second, by constructing an ILP that identifies an assignment with minimal loss for that $n$. We also evaluate the method on small examples where the interleaving distance is known, and on benchmark and simulated datasets, demonstrating the utility of the approach for classification tasks based on mapper graphs.

Sarah Percival

2 Papers