Yueqi Cao

ML
h-index12
6papers
36citations
Novelty55%
AI Score39

6 Papers

MLApr 19, 2022
Approximating Persistent Homology for Large Datasets

Yueqi Cao, Anthea Monod

Persistent homology is an important methodology in topological data analysis which adapts theory from algebraic topology to data settings. Computing persistent homology produces persistence diagrams, which have been successfully used in diverse domains. Despite its widespread use, persistent homology is simply impossible to compute when a dataset is very large. We study a statistical approach to the problem of computing persistent homology for massive datasets using a multiple subsampling framework and extend it to three summaries of persistent homology: Hölder continuous vectorizations of persistence diagrams; the alternative representation as persistence measures; and standard persistence diagrams. Specifically, we derive finite sample convergence rates for empirical means for persistent homology and practical guidance on interpreting and tuning parameters. We validate our approach through extensive experiments on both synthetic and real-world data. We demonstrate the performance of multiple subsampling in a permutation test to analyze the topological structure of Poincaré embeddings of large lexical databases.

LGFeb 3
Riemannian Neural Optimal Transport

Alessandro Micheli, Yueqi Cao, Anthea Monod et al.

Computational optimal transport (OT) offers a principled framework for generative modeling. Neural OT methods, which use neural networks to learn an OT map (or potential) from data in an amortized way, can be evaluated out of sample after training, but existing approaches are tailored to Euclidean geometry. Extending neural OT to high-dimensional Riemannian manifolds remains an open challenge. In this paper, we prove that any method for OT on manifolds that produces discrete approximations of transport maps necessarily suffers from the curse of dimensionality: achieving a fixed accuracy requires a number of parameters that grows exponentially with the manifold dimension. Motivated by this limitation, we introduce Riemannian Neural OT (RNOT) maps, which are continuous neural-network parameterizations of OT maps on manifolds that avoid discretization and incorporate geometric structure by construction. Under mild regularity assumptions, we prove that RNOT maps approximate Riemannian OT maps with sub-exponential complexity in the dimension. Experiments on synthetic and real datasets demonstrate improved scalability and competitive performance relative to discretization-based baselines.

CVOct 21, 2024
Data-Efficient CLIP-Powered Dual-Branch Networks for Source-Free Unsupervised Domain Adaptation

Yongguang Li, Yueqi Cao, Jindong Li et al.

Source-free Unsupervised Domain Adaptation (SF-UDA) aims to transfer a model's performance from a labeled source domain to an unlabeled target domain without direct access to source samples, addressing critical data privacy concerns. However, most existing SF-UDA approaches assume the availability of abundant source domain samples, which is often impractical due to the high cost of data annotation. To address the dual challenges of limited source data and privacy concerns, we introduce a data-efficient, CLIP-powered dual-branch network (CDBN). This architecture consists of a cross-domain feature transfer branch and a target-specific feature learning branch, leveraging high-confidence target domain samples to transfer text features of source domain categories while learning target-specific soft prompts. By fusing the outputs of both branches, our approach not only effectively transfers source domain category semantic information to the target domain but also reduces the negative impacts of noise and domain gaps during target training. Furthermore, we propose an unsupervised optimization strategy driven by accurate classification and diversity, preserving the classification capability learned from the source domain while generating more confident and diverse predictions in the target domain. CDBN achieves near state-of-the-art performance with far fewer source domain samples than existing methods across 31 transfer tasks on seven datasets.

LGMay 17, 2025
Metric Graph Kernels via the Tropical Torelli Map

Yueqi Cao, Anthea Monod

We propose new graph kernels grounded in the study of metric graphs via tropical algebraic geometry. In contrast to conventional graph kernels that are based on graph combinatorics such as nodes, edges, and subgraphs, our graph kernels are purely based on the geometry and topology of the underlying metric space. A key characterizing property of our construction is its invariance under edge subdivision, making the kernels intrinsically well-suited for comparing graphs that represent different underlying spaces. We develop efficient algorithms for computing these kernels and analyze their complexity, showing that it depends primarily on the genus of the input graphs. Empirically, our kernels outperform existing methods in label-free settings, as demonstrated on both synthetic and real-world benchmark datasets. We further highlight their practical utility through an urban road network classification task.

MLApr 4, 2021
Topological Information Retrieval with Dilation-Invariant Bottleneck Comparative Measures

Yueqi Cao, Athanasios Vlontzos, Luca Schmidtke et al.

Appropriately representing elements in a database so that queries may be accurately matched is a central task in information retrieval; recently, this has been achieved by embedding the graphical structure of the database into a manifold in a hierarchy-preserving manner using a variety of metrics. Persistent homology is a tool commonly used in topological data analysis that is able to rigorously characterize a database in terms of both its hierarchy and connectivity structure. Computing persistent homology on a variety of embedded datasets reveals that some commonly used embeddings fail to preserve the connectivity. We show that those embeddings which successfully retain the database topology coincide in persistent homology by introducing two dilation-invariant comparative measures to capture this effect: in particular, they address the issue of metric distortion on manifolds. We provide an algorithm for their computation that exhibits greatly reduced time complexity over existing methods. We use these measures to perform the first instance of topology-based information retrieval and demonstrate its increased performance over the standard bottleneck distance for persistent homology. We showcase our approach on databases of different data varieties including text, videos, and medical images.

MLMay 26, 2019
Efficient Weingarten Map and Curvature Estimation on Manifolds

Yueqi Cao, Didong Li, Huafei Sun et al.

In this paper, we propose an efficient method to estimate the Weingarten map for point cloud data sampled from manifold embedded in Euclidean space. A statistical model is established to analyze the asymptotic property of the estimator. In particular, we show the convergence rate as the sample size tends to infinity. We verify the convergence rate through simulated data and apply the estimated Weingarten map to curvature estimation and point cloud simplification to multiple real data sets.