Paulina Hoyos

LG
h-index21
6papers
6citations
Novelty65%
AI Score45

6 Papers

LGMar 29, 2023
The G-invariant graph Laplacian

Eitan Rosen, Paulina Hoyos, Xiuyuan Cheng et al.

Graph Laplacian based algorithms for data lying on a manifold have been proven effective for tasks such as dimensionality reduction, clustering, and denoising. In this work, we consider data sets whose data points lie on a manifold that is closed under the action of a known unitary matrix Lie group G. We propose to construct the graph Laplacian by incorporating the distances between all the pairs of points generated by the action of G on the data set. We deem the latter construction the ``G-invariant Graph Laplacian'' (G-GL). We show that the G-GL converges to the Laplace-Beltrami operator on the data manifold, while enjoying a significantly improved convergence rate compared to the standard graph Laplacian which only utilizes the distances between the points in the given data set. Furthermore, we show that the G-GL admits a set of eigenfunctions that have the form of certain products between the group elements and eigenvectors of certain matrices, which can be estimated from the data efficiently using FFT-type algorithms. We demonstrate our construction and its advantages on the problem of filtering data on a noisy manifold closed under the action of the special unitary group SU(2).

LGMay 19
Group-Algebraic Tensors: Provably-optimal Equivariant Learning and Physical Symmetry Discovery

Paulina Hoyos, Shashanka Ubaru, Dongsung Huh et al.

We introduce the $\star_G$ tensor algebra, in which any finite group $G$ defines the multiplication rule, making equivariance an intrinsic algebraic property rather than an architectural constraint. The framework rests on three machine-verified theoretical pillars: (i)~an Eckart-Young optimality guarantee for the $\star_G$-SVD: the first such result for symmetry-preserving tensor approximation, exact and polynomial-time; (ii)~a Kronecker factorization that composes multiple symmetries by replacing $F_G$ with $F_{G_1} \otimes F_{G_2}$ with no architectural redesign; and (iii)~a 600-line Lean~4 formalization of the $\star_G$ algebra. The framework provides capabilities that equivariant neural networks (ENNs) structurally cannot: a closed-form per-irreducible-representation decomposition of every prediction, and data-driven discovery of the symmetry group that best fits a dataset. As a non-trivial empirical demonstration, decomposing QM9 molecular geometry over the chiral octahedral subgroup of SO(3) recovers the Wigner--Eckart selection rules of angular momentum from data alone, with no quantum mechanical input: scalar properties are A$_1$-dominated, dipole components are T$_1$-dominated, the isotropic polarizability is uniquely insensitive to $l\!=\!1$ as the rank-2-trace decomposition $l\!=\!0 \oplus l\!=\!2$ requires, and the T$_1$/A$_1$ predictive-power ratio separates vector observables from scalar observables by a factor of five. On full QM9 (130{,}831 molecules), $\star_G$-SVD with ridge regression provides closed form predictions at $\sim50-90\times$ fewer parameters than parameter-matched MLPs. Algebraic equivariance thus complements architectural equivariance not as a faster-better-cheaper alternative but as a different mathematical affordance: provably-optimal symmetry-preserving compression, per-irrep interpretability, and data-driven physical discovery.

LGMar 28, 2023
Diffusion Maps for Group-Invariant Manifolds

Paulina Hoyos, Joe Kileel

In this article, we consider the manifold learning problem when the data set is invariant under the action of a compact Lie group $K$. Our approach consists in augmenting the data-induced graph Laplacian by integrating over the $K$-orbits of the existing data points, which yields a $K$-invariant graph Laplacian $L$. We prove that $L$ can be diagonalized by using the unitary irreducible representation matrices of $K$, and we provide an explicit formula for computing its eigenvalues and eigenfunctions. In addition, we show that the normalized Laplacian operator $L_N$ converges to the Laplace-Beltrami operator of the data manifold with an improved convergence rate, where the improvement grows with the dimension of the symmetry group $K$. This work extends the steerable graph Laplacian framework of Landa and Shkolnisky from the case of $\operatorname{SO}(2)$ to arbitrary compact Lie groups.

SPOct 21, 2025
SO(3)-invariant PCA with application to molecular data

Michael Fraiman, Paulina Hoyos, Tamir Bendory et al.

Principal component analysis (PCA) is a fundamental technique for dimensionality reduction and denoising; however, its application to three-dimensional data with arbitrary orientations -- common in structural biology -- presents significant challenges. A naive approach requires augmenting the dataset with many rotated copies of each sample, incurring prohibitive computational costs. In this paper, we extend PCA to 3D volumetric datasets with unknown orientations by developing an efficient and principled framework for SO(3)-invariant PCA that implicitly accounts for all rotations without explicit data augmentation. By exploiting underlying algebraic structure, we demonstrate that the computation involves only the square root of the total number of covariance entries, resulting in a substantial reduction in complexity. We validate the method on real-world molecular datasets, demonstrating its effectiveness and opening up new possibilities for large-scale, high-dimensional reconstruction problems.

MLMar 23, 2025
Quantile-Based Randomized Kaczmarz for Corrupted Tensor Linear Systems

Alejandra Castillo, Jamie Haddock, Iryna Hartsock et al.

The reconstruction of tensor-valued signals from corrupted measurements, known as tensor regression, has become essential in many multi-modal applications such as hyperspectral image reconstruction and medical imaging. In this work, we address the tensor linear system problem $\mathcal{A} \mathcal{X}=\mathcal{B}$, where $\mathcal{A}$ is a measurement operator, $\mathcal{X}$ is the unknown tensor-valued signal, and $\mathcal{B}$ contains the measurements, possibly corrupted by arbitrary errors. Such corruption is common in large-scale tensor data, where transmission, sensory, or storage errors are rare per instance but likely over the entire dataset and may be arbitrarily large in magnitude. We extend the Kaczmarz method, a popular iterative algorithm for solving large linear systems, to develop a Quantile Tensor Randomized Kaczmarz (QTRK) method robust to large, sparse corruptions in the observations $\mathcal{B}$. This approach combines the tensor Kaczmarz framework with quantile-based statistics, allowing it to mitigate adversarial corruptions and improve convergence reliability. We also propose and discuss the Masked Quantile Randomized Kaczmarz (mQTRK) variant, which selectively applies partial updates to handle corruptions further. We present convergence guarantees, discuss the advantages and disadvantages of our approaches, and demonstrate the effectiveness of our methods through experiments, including an application for video deblurring.

QUANT-PHApr 23, 2021
An integer factorization algorithm which uses diffusion as a computational engine

Carlos A. Cadavid, Paulina Hoyos, Jay Jorgenson et al.

In this article we develop an algorithm which computes a divisor of an integer $N$, which is assumed to be neither prime nor the power of a prime. The algorithm uses discrete time heat diffusion on a finite graph. If $N$ has $m$ distinct prime factors, then the probability that our algorithm runs successfully is at least $p(m) = 1-(m+1)/2^{m}$. We compute the computational complexity of the algorithm in terms of classical, or digital, steps and in terms of diffusion steps, which is a concept that we define here. As we will discuss below, we assert that a diffusion step can and should be considered as being comparable to a quantum step for an algorithm which runs on a quantum computer. With this, we prove that our factorization algorithm uses at most $O((\log N)^{2})$ deterministic steps and at most $O((\log N)^{2})$ diffusion steps with an implied constant which is effective. By comparison, Shor's algorithm is known to use at most $O((\log N)^{2}\log (\log N) \log (\log \log N))$ quantum steps on a quantum computer. As an example of our algorithm, we simulate the diffusion computer algorithm on a desktop computer and obtain factorizations of $N=33$ and $N=1363$.