MGMar 14, 2022
Geometry of DataParvaneh Joharinad, Jürgen Jost
Topological data analysis asks when balls in a metric space $(X,d)$ intersect. Geometric data analysis asks how much balls have to be enlarged to intersect. We connect this principle to the traditional core geometric concept of curvature. This enables us, on one hand, to reconceptualize curvature and link it to the geometric notion of hyperconvexity. On the other hand, we can then also understand methods of topological data analysis from a geometric perspective.
LGJul 25, 2024
IsUMap: Manifold Learning and Data Visualization leveraging Vietoris-Rips filtrationsLukas Silvester Barth, Fatemeh, Fahimi et al.
This work introduces IsUMap, a novel manifold learning technique that enhances data representation by integrating aspects of UMAP and Isomap with Vietoris-Rips filtrations. We present a systematic and detailed construction of a metric representation for locally distorted metric spaces that captures complex data structures more accurately than the previous schemes. Our approach addresses limitations in existing methods by accommodating non-uniform data distributions and intricate local geometries. We validate its performance through extensive experiments on examples of various geometric objects and benchmark real-world datasets, demonstrating significant improvements in representation quality.
LGDec 3, 2025
Probabilistic Foundations of Fuzzy Simplicial Sets for Nonlinear Dimensionality ReductionJanis Keck, Lukas Silvester Barth, Fatemeh et al.
Fuzzy simplicial sets have become an object of interest in dimensionality reduction and manifold learning, most prominently through their role in UMAP. However, their definition through tools from algebraic topology without a clear probabilistic interpretation detaches them from commonly used theoretical frameworks in those areas. In this work we introduce a framework that explains fuzzy simplicial sets as marginals of probability measures on simplicial sets. In particular, this perspective shows that the fuzzy weights of UMAP arise from a generative model that samples Vietoris-Rips filtrations at random scales, yielding cumulative distribution functions of pairwise distances. More generally, the framework connects fuzzy simplicial sets to probabilistic models on the face poset, clarifies the relation between Kullback-Leibler divergence and fuzzy cross-entropy in this setting, and recovers standard t-norms and t-conorms via Boolean operations on the underlying simplicial sets. We then show how new embedding methods may be derived from this framework and illustrate this on an example where we generalize UMAP using Čech filtrations with triplet sampling. In summary, this probabilistic viewpoint provides a unified probabilistic theoretical foundation for fuzzy simplicial sets, clarifies the role of UMAP within this framework, and enables the systematic derivation of new dimensionality reduction methods.
CVSep 16, 2025
Curvature as a tool for evaluating dimensionality reduction and estimating intrinsic dimensionCharlotte Beylier, Parvaneh Joharinad, Jürgen Jost et al.
Utilizing recently developed abstract notions of sectional curvature, we introduce a method for constructing a curvature-based geometric profile of discrete metric spaces. The curvature concept that we use here captures the metric relations between triples of points and other points. More significantly, based on this curvature profile, we introduce a quantitative measure to evaluate the effectiveness of data representations, such as those produced by dimensionality reduction techniques. Furthermore, Our experiments demonstrate that this curvature-based analysis can be employed to estimate the intrinsic dimensionality of datasets. We use this to explore the large-scale geometry of empirical networks and to evaluate the effectiveness of dimensionality reduction techniques.
LGMar 3, 2025
Merging Hazy Sets with m-Schemes: A Geometric Approach to Data VisualizationLukas Silvester Barth, Hannaneh Fahimi, Parvaneh Joharinad et al.
Many machine learning algorithms try to visualize high dimensional metric data in 2D in such a way that the essential geometric and topological features of the data are highlighted. In this paper, we introduce a framework for aggregating dissimilarity functions that arise from locally adjusting a metric through density-aware normalization, as employed in the IsUMap method. We formalize these approaches as m-schemes, a class of methods closely related to t-norms and t-conorms in probabilistic metrics, as well as to composition laws in information theory. These m-schemes provide a flexible and theoretically grounded approach to refining distance-based embeddings.