LGFeb 14, 2023
Linearized Wasserstein dimensionality reduction with approximation guaranteesAlexander Cloninger, Keaton Hamm, Varun Khurana et al.
We introduce LOT Wassmap, a computationally feasible algorithm to uncover low-dimensional structures in the Wasserstein space. The algorithm is motivated by the observation that many datasets are naturally interpreted as probability measures rather than points in $\mathbb{R}^n$, and that finding low-dimensional descriptions of such datasets requires manifold learning algorithms in the Wasserstein space. Most available algorithms are based on computing the pairwise Wasserstein distance matrix, which can be computationally challenging for large datasets in high dimensions. Our algorithm leverages approximation schemes such as Sinkhorn distances and linearized optimal transport to speed-up computations, and in particular, avoids computing a pairwise distance matrix. We provide guarantees on the embedding quality under such approximations, including when explicit descriptions of the probability measures are not available and one must deal with finite samples instead. Experiments demonstrate that LOT Wassmap attains correct embeddings and that the quality improves with increased sample size. We also show how LOT Wassmap significantly reduces the computational cost when compared to algorithms that depend on pairwise distance computations.
LGApr 13, 2022
Wassmap: Wasserstein Isometric Mapping for Image Manifold LearningKeaton Hamm, Nick Henscheid, Shujie Kang
In this paper, we propose Wasserstein Isometric Mapping (Wassmap), a nonlinear dimensionality reduction technique that provides solutions to some drawbacks in existing global nonlinear dimensionality reduction algorithms in imaging applications. Wassmap represents images via probability measures in Wasserstein space, then uses pairwise Wasserstein distances between the associated measures to produce a low-dimensional, approximately isometric embedding. We show that the algorithm is able to exactly recover parameters of some image manifolds including those generated by translations or dilations of a fixed generating measure. Additionally, we show that a discrete version of the algorithm retrieves parameters from manifolds generated from discrete measures by providing a theoretical bridge to transfer recovery results from functional data to discrete data. Testing of the proposed algorithms on various image data manifolds show that Wassmap yields good embeddings compared with other global and local techniques.
MLJun 17, 2022
Riemannian CUR Decompositions for Robust Principal Component AnalysisKeaton Hamm, Mohamed Meskini, HanQin Cai
Robust Principal Component Analysis (PCA) has received massive attention in recent years. It aims to recover a low-rank matrix and a sparse matrix from their sum. This paper proposes a novel nonconvex Robust PCA algorithm, coined Riemannian CUR (RieCUR), which utilizes the ideas of Riemannian optimization and robust CUR decompositions. This algorithm has the same computational complexity as Iterated Robust CUR, which is currently state-of-the-art, but is more robust to outliers. RieCUR is also able to tolerate a significant amount of outliers, and is comparable to Accelerated Alternating Projections, which has high outlier tolerance but worse computational complexity than the proposed method. Thus, the proposed algorithm achieves state-of-the-art performance on Robust PCA both in terms of computational complexity and outlier tolerance.
MLFeb 21, 2023
Boosting Nyström MethodKeaton Hamm, Zhaoying Lu, Wenbo Ouyang et al.
The Nyström method is an effective tool to generate low-rank approximations of large matrices, and it is particularly useful for kernel-based learning. To improve the standard Nyström approximation, ensemble Nyström algorithms compute a mixture of Nyström approximations which are generated independently based on column resampling. We propose a new family of algorithms, boosting Nyström, which iteratively generate multiple ``weak'' Nyström approximations (each using a small number of columns) in a sequence adaptively - each approximation aims to compensate for the weaknesses of its predecessor - and then combine them to form one strong approximation. We demonstrate that our boosting Nyström algorithms can yield more efficient and accurate low-rank approximations to kernel matrices. Improvements over the standard and ensemble Nyström methods are illustrated by simulation studies and real-world data analysis.
MLNov 14, 2023
Manifold learning in Wasserstein spaceKeaton Hamm, Caroline Moosmüller, Bernhard Schmitzer et al.
This paper aims at building the theoretical foundations for manifold learning algorithms in the space of absolutely continuous probability measures $\mathcal{P}_{\mathrm{a.c.}}(Ω)$ with $Ω$ a compact and convex subset of $\mathbb{R}^d$, metrized with the Wasserstein-2 distance $\mathbb{W}$. We begin by introducing a construction of submanifolds $Λ$ in $\mathcal{P}_{\mathrm{a.c.}}(Ω)$ equipped with metric $\mathbb{W}_Λ$, the geodesic restriction of $\mathbb{W}$ to $Λ$. In contrast to other constructions, these submanifolds are not necessarily flat, but still allow for local linearizations in a similar fashion to Riemannian submanifolds of $\mathbb{R}^d$. We then show how the latent manifold structure of $(Λ,\mathbb{W}_Λ)$ can be learned from samples $\{λ_i\}_{i=1}^N$ of $Λ$ and pairwise extrinsic Wasserstein distances $\mathbb{W}$ on $\mathcal{P}_{\mathrm{a.c.}}(Ω)$ only. In particular, we show that the metric space $(Λ,\mathbb{W}_Λ)$ can be asymptotically recovered in the sense of Gromov--Wasserstein from a graph with nodes $\{λ_i\}_{i=1}^N$ and edge weights $W(λ_i,λ_j)$. In addition, we demonstrate how the tangent space at a sample $λ$ can be asymptotically recovered via spectral analysis of a suitable ``covariance operator'' using optimal transport maps from $λ$ to sufficiently close and diverse samples $\{λ_i\}_{i=1}^N$. The paper closes with some explicit constructions of submanifolds $Λ$ and numerical examples on the recovery of tangent spaces through spectral analysis.
MLOct 13, 2023
Wasserstein approximation schemes based on Voronoi partitionsKeaton Hamm, Varun Khurana
We consider structured approximation of measures in Wasserstein space $\mathrm{W}_p(\mathbb{R}^d)$ for $p\in[1,\infty)$ using general measure approximants compactly supported on Voronoi regions derived from a scaled Voronoi partition of $\mathbb{R}^d$. We show that if a full rank lattice $Λ$ is scaled by a factor of $h\in(0,1]$, then approximation of a measure based on the Voronoi partition of $hΛ$ is $O(h)$ regardless of $d$ or $p$. We then use a covering argument to show that $N$-term approximations of compactly supported measures is $O(N^{-\frac1d})$ which matches known rates for optimal quantizers and empirical measure approximation in most instances. Additionally, we generalize our construction to nonuniform Voronoi partitions, highlighting the flexibility and robustness of our approach for various measure approximation scenarios. Finally, we extend these results to noncompactly supported measures with sufficient decay. Our findings are pertinent to applications in computer vision and machine learning where measures are used to represent structured data such as images.
MLOct 5, 2023
On Wasserstein distances for affine transformations of random vectorsKeaton Hamm, Andrzej Korzeniowski
We expound on some known lower bounds of the quadratic Wasserstein distance between random vectors in $\mathbb{R}^n$ with an emphasis on affine transformations that have been used in manifold learning of data in Wasserstein space. In particular, we give concrete lower bounds for rotated copies of random vectors in $\mathbb{R}^2$ by computing the Bures metric between the covariance matrices. We also derive upper bounds for compositions of affine maps which yield a fruitful variety of diffeomorphisms applied to an initial data measure. We apply these bounds to various distributions including those lying on a 1-dimensional manifold in $\mathbb{R}^2$ and illustrate the quality of the bounds. Finally, we give a framework for mimicking handwritten digit or alphabet datasets that can be applied in a manifold learning framework.
LGApr 11, 2024
Persistent Classification: A New Approach to Stability of Data and Adversarial ExamplesBrian Bell, Michael Geyer, David Glickenstein et al.
There are a number of hypotheses underlying the existence of adversarial examples for classification problems. These include the high-dimensionality of the data, high codimension in the ambient space of the data manifolds of interest, and that the structure of machine learning models may encourage classifiers to develop decision boundaries close to data points. This article proposes a new framework for studying adversarial examples that does not depend directly on the distance to the decision boundary. Similarly to the smoothed classifier literature, we define a (natural or adversarial) data point to be $(γ,σ)$-stable if the probability of the same classification is at least $γ$ for points sampled in a Gaussian neighborhood of the point with a given standard deviation $σ$. We focus on studying the differences between persistence metrics along interpolants of natural and adversarial points. We show that adversarial examples have significantly lower persistence than natural examples for large neural networks in the context of the MNIST and ImageNet datasets. We connect this lack of persistence with decision boundary geometry by measuring angles of interpolants with respect to decision boundaries. Finally, we connect this approach with robustness by developing a manifold alignment gradient metric and demonstrating the increase in robustness that can be achieved when training with the addition of this metric.
LGSep 27, 2025
LOTFormer: Doubly-Stochastic Linear Attention via Low-Rank Optimal TransportAshkan Shahbazi, Chayne Thrash, Yikun Bai et al.
Transformers have proven highly effective across a wide range of modalities. However, the quadratic complexity of the standard softmax attention mechanism poses a fundamental barrier to scaling them to long context windows. A large body of work addresses this with linear attention, which reformulates attention as a kernel function and approximates it with finite feature maps to achieve linear-time computation. Orthogonal to computational scaling, most attention mechanisms -- both quadratic and linear -- produce row-normalized maps that can over-focus on a few tokens, degrading robustness and information flow. Enforcing doubly-stochastic attention alleviates this by balancing token participation across rows and columns, but existing doubly-stochastic attention mechanisms typically introduce substantial overhead, undermining scalability. We propose LOTFormer, a principled attention mechanism that is simultaneously linear-time and doubly-stochastic. Our approach exploits the connection between attention maps and transportation plans between query and key measures. The central idea is to constrain the transport plan to be low-rank by conditioning it on a learnable pivot measure with small support. Concretely, we solve two entropic optimal transport problems (queries $\to$ pivot and pivot $\to$ keys) and compose them into a conditional (glued) coupling. This yields an attention matrix that is provably doubly-stochastic, has rank at most $r \ll n$, and applies to values in $O(nr)$ time without forming the full $n \times n$ map. The pivot locations and masses are learned end-to-end. Empirically, LOTFormer achieves state-of-the-art results on the Long Range Arena benchmark, surpassing prior linear and transport-based attention methods in both accuracy and efficiency.
MLSep 23, 2025
Recovering Wasserstein Distance Matrices from Few MeasurementsMuhammad Rana, Abiy Tasissa, HanQin Cai et al.
This paper proposes two algorithms for estimating square Wasserstein distance matrices from a small number of entries. These matrices are used to compute manifold learning embeddings like multidimensional scaling (MDS) or Isomap, but contrary to Euclidean distance matrices, are extremely costly to compute. We analyze matrix completion from upper triangular samples and Nyström completion in which $\mathcal{O}(d\log(d))$ columns of the distance matrices are computed where $d$ is the desired embedding dimension, prove stability of MDS under Nyström completion, and show that it can outperform matrix completion for a fixed budget of sample distances. Finally, we show that classification of the OrganCMNIST dataset from the MedMNIST benchmark is stable on data embedded from the Nyström estimation of the distance matrix even when only 10\% of the columns are computed.
MLSep 23, 2025
Neighbor Embeddings Using Unbalanced Optimal Transport MetricsMuhammad Rana, Keaton Hamm
This paper proposes the use of the Hellinger--Kantorovich metric from unbalanced optimal transport (UOT) in a dimensionality reduction and learning (supervised and unsupervised) pipeline. The performance of UOT is compared to that of regular OT and Euclidean-based dimensionality reduction methods on several benchmark datasets including MedMNIST. The experimental results demonstrate that, on average, UOT shows improvement over both Euclidean and OT-based methods as verified by statistical hypothesis tests. In particular, on the MedMNIST datasets, UOT outperforms OT in classification 81\% of the time. For clustering MedMNIST, UOT outperforms OT 83\% of the time and outperforms both other metrics 58\% of the time.
LGAug 18, 2021
Computing Steiner Trees using Graph Neural NetworksReyan Ahmed, Md Asadullah Turja, Faryad Darabi Sahneh et al.
Graph neural networks have been successful in many learning problems and real-world applications. A recent line of research explores the power of graph neural networks to solve combinatorial and graph algorithmic problems such as subgraph isomorphism, detecting cliques, and the traveling salesman problem. However, many NP-complete problems are as of yet unexplored using this method. In this paper, we tackle the Steiner Tree Problem. We employ four learning frameworks to compute low cost Steiner trees: feed-forward neural networks, graph neural networks, graph convolutional networks, and a graph attention model. We use these frameworks in two fundamentally different ways: 1) to train the models to learn the actual Steiner tree nodes, 2) to train the model to learn good Steiner point candidates to be connected to the constructed tree using a shortest path in a greedy fashion. We illustrate the robustness of our heuristics on several random graph generation models as well as the SteinLib data library. Our finding suggests that the out-of-the-box application of GNN methods does worse than the classic 2-approximation method. However, when combined with a greedy shortest path construction, it even does slightly better than the 2-approximation algorithm. This result sheds light on the fundamental capabilities and limitations of graph learning techniques on classical NP-complete problems.
CVJun 22, 2021
On Matrix Factorizations in Subspace ClusteringReeshad Arian, Keaton Hamm
This article explores subspace clustering algorithms using CUR decompositions, and examines the effect of various hyperparameters in these algorithms on clustering performance on two real-world benchmark datasets, the Hopkins155 motion segmentation dataset and the Yale face dataset. Extensive experiments are done for a variety of sampling methods and oversampling parameters for these datasets, and some guidelines for parameter choices are given for practical applications.
NAMar 19, 2021
Mode-wise Tensor Decompositions: Multi-dimensional Generalizations of CUR DecompositionsHanQin Cai, Keaton Hamm, Longxiu Huang et al.
Low rank tensor approximation is a fundamental tool in modern machine learning and data science. In this paper, we study the characterization, perturbation analysis, and an efficient sampling strategy for two primary tensor CUR approximations, namely Chidori and Fiber CUR. We characterize exact tensor CUR decompositions for low multilinear rank tensors. We also present theoretical error bounds of the tensor CUR approximations when (adversarial or Gaussian) noise appears. Moreover, we show that low cost uniform sampling is sufficient for tensor CUR approximations if the tensor has an incoherent structure. Empirical performance evaluations, with both synthetic and real-world datasets, establish the speed advantage of the tensor CUR approximations over other state-of-the-art low multilinear rank tensor approximations.
CVJan 5, 2021
Robust CUR Decomposition: Theory and Imaging ApplicationsHanQin Cai, Keaton Hamm, Longxiu Huang et al.
This paper considers the use of Robust PCA in a CUR decomposition framework and applications thereof. Our main algorithms produce a robust version of column-row factorizations of matrices $\mathbf{D}=\mathbf{L}+\mathbf{S}$ where $\mathbf{L}$ is low-rank and $\mathbf{S}$ contains sparse outliers. These methods yield interpretable factorizations at low computational cost, and provide new CUR decompositions that are robust to sparse outliers, in contrast to previous methods. We consider two key imaging applications of Robust PCA: video foreground-background separation and face modeling. This paper examines the qualitative behavior of our Robust CUR decompositions on the benchmark videos and face datasets, and find that our method works as well as standard Robust PCA while being significantly faster. Additionally, we consider hybrid randomized and deterministic sampling methods which produce a compact CUR decomposition of a given matrix, and apply this to video sequences to produce canonical frames thereof.
MLOct 14, 2020
Rapid Robust Principal Component Analysis: CUR Accelerated Inexact Low Rank EstimationHanQin Cai, Keaton Hamm, Longxiu Huang et al.
Robust principal component analysis (RPCA) is a widely used tool for dimension reduction. In this work, we propose a novel non-convex algorithm, coined Iterated Robust CUR (IRCUR), for solving RPCA problems, which dramatically improves the computational efficiency in comparison with the existing algorithms. IRCUR achieves this acceleration by employing CUR decomposition when updating the low rank component, which allows us to obtain an accurate low rank approximation via only three small submatrices. Consequently, IRCUR is able to process only the small submatrices and avoid expensive computing on the full matrix through the entire algorithm. Numerical experiments establish the computational advantage of IRCUR over the state-of-art algorithms on both synthetic and real-world datasets.
NAMar 22, 2019
CUR Decompositions, Approximations, and PerturbationsKeaton Hamm, Longxiu Huang
This article discusses a useful tool in dimensionality reduction and low-rank matrix approximation called the CUR decomposition. Various viewpoints of this method in the literature are synergized and are compared and contrasted; included in this is a new characterization of exact CUR decompositions. A novel perturbation analysis is performed on CUR approximations of noisy versions of low-rank matrices, which compares them with the putative CUR decomposition of the underlying low-rank part. Additionally, we give new column and row sampling results which allow one to conclude that a CUR decomposition of a low-rank matrix is attained with high probability. We then illustrate the stability of these sampling methods under the perturbations studied before, and provide numerical illustrations of the methods and bounds discussed.
LGNov 11, 2017
CUR Decompositions, Similarity Matrices, and Subspace ClusteringAkram Aldroubi, Keaton Hamm, Ahmet Bugra Koku et al.
A general framework for solving the subspace clustering problem using the CUR decomposition is presented. The CUR decomposition provides a natural way to construct similarity matrices for data that come from a union of unknown subspaces $\mathscr{U}=\underset{i=1}{\overset{M}\bigcup}S_i$. The similarity matrices thus constructed give the exact clustering in the noise-free case. Additionally, this decomposition gives rise to many distinct similarity matrices from a given set of data, which allow enough flexibility to perform accurate clustering of noisy data. We also show that two known methods for subspace clustering can be derived from the CUR decomposition. An algorithm based on the theoretical construction of similarity matrices is presented, and experiments on synthetic and real data are presented to test the method. Additionally, an adaptation of our CUR based similarity matrices is utilized to provide a heuristic algorithm for subspace clustering; this algorithm yields the best overall performance to date for clustering the Hopkins155 motion segmentation dataset.