ATCGMEMLDec 23, 2019

Statistical analysis of Mapper for stochastic and multivariate filters

arXiv:1912.10742v23 citations
Originality Synthesis-oriented
AI Analysis

This work addresses a gap in topological data analysis for researchers needing robust statistical methods with multivariate filters, though it is incremental as it builds on prior stability studies.

The paper tackles the lack of statistical guarantees for Mapper in multivariate and stochastic filter settings, providing risk bounds for estimating Reeb spaces with respect to the Gromov-Hausdorff distance and demonstrating applicability through numerical experiments.

Reeb spaces, as well as their discretized versions called Mappers, are common descriptors used in Topological Data Analysis, with plenty of applications in various fields of science, such as computational biology and data visualization, among others. The stability and quantification of the rate of convergence of the Mapper to the Reeb space has been studied a lot in recent works [BBMW19, CO17, CMO18, MW16], focusing on the case where a scalar-valued filter is used for the computation of Mapper. On the other hand, much less is known in the multivariate case, when the codomain of the filter is $\mathbb{R}^p$, and in the general case, when it is a general metric space $(Z, d_Z)$, instead of $\mathbb{R}$. The few results that are available in this setting [DMW17, MW16] can only handle continuous topological spaces and cannot be used as is for finite metric spaces representing data, such as point clouds and distance matrices. In this article, we introduce a slight modification of the usual Mapper construction and we give risk bounds for estimating the Reeb space using this estimator. Our approach applies in particular to the setting where the filter function used to compute Mapper is also estimated from data, such as the eigenfunctions of PCA. Our results are given with respect to the Gromov-Hausdorff distance, computed with specific filter-based pseudometrics for Mappers and Reeb spaces defined in [DMW17]. We finally provide applications of this setting in statistics and machine learning for different kinds of target filters, as well as numerical experiments that demonstrate the relevance of our approach

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes