Marcelo Hartmann

LG
h-index25
10papers
55citations
Novelty56%
AI Score47

10 Papers

MLAug 16, 2023
Warped geometric information on the optimisation of Euclidean functions

Marcelo Hartmann, Bernardo Williams, Hanlin Yu et al.

We consider the fundamental task of optimising a real-valued function defined in a potentially high-dimensional Euclidean space, such as the loss function in many machine-learning tasks or the logarithm of the probability distribution in statistical inference. We use Riemannian geometry notions to redefine the optimisation problem of a function on the Euclidean space to a Riemannian manifold with a warped metric, and then find the function's optimum along this manifold. The warped metric chosen for the search domain induces a computational friendly metric-tensor for which optimal search directions associated with geodesic curves on the manifold becomes easier to compute. Performing optimization along geodesics is known to be generally infeasible, yet we show that in this specific manifold we can analytically derive Taylor approximations up to third-order. In general these approximations to the geodesic curve will not lie on the manifold, however we construct suitable retraction maps to pull them back onto the manifold. Therefore, we can efficiently optimize along the approximate geodesic curves. We cover the related theory, describe a practical optimization algorithm and empirically evaluate it on a collection of challenging optimisation benchmarks. Our proposed algorithm, using 3rd-order approximation of geodesics, tends to outperform standard Euclidean gradient-based counterparts in term of number of iterations until convergence.

LGMar 9, 2023
Scalable Stochastic Gradient Riemannian Langevin Dynamics in Non-Diagonal Metrics

Hanlin Yu, Marcelo Hartmann, Bernardo Williams et al.

Stochastic-gradient sampling methods are often used to perform Bayesian inference on neural networks. It has been observed that the methods in which notions of differential geometry are included tend to have better performances, with the Riemannian metric improving posterior exploration by accounting for the local curvature. However, the existing methods often resort to simple diagonal metrics to remain computationally efficient. This loses some of the gains. We propose two non-diagonal metrics that can be used in stochastic-gradient samplers to improve convergence and exploration but have only a minor computational overhead over diagonal metrics. We show that for fully connected neural networks (NNs) with sparsity-inducing priors and convolutional NNs with correlated priors, using these metrics can provide improvements. For some other choices the posterior is sufficiently easy also for the simpler metrics.

LGNov 5, 2023
Riemannian Laplace Approximation with the Fisher Metric

Hanlin Yu, Marcelo Hartmann, Bernardo Williams et al.

Laplace's method approximates a target density with a Gaussian distribution at its mode. It is computationally efficient and asymptotically exact for Bayesian inference due to the Bernstein-von Mises theorem, but for complex targets and finite-data posteriors it is often too crude an approximation. A recent generalization of the Laplace Approximation transforms the Gaussian approximation according to a chosen Riemannian geometry providing a richer approximation family, while still retaining computational efficiency. However, as shown here, its properties depend heavily on the chosen metric, indeed the metric adopted in previous work results in approximations that are overly narrow as well as being biased even at the limit of infinite data. We correct this shortcoming by developing the approximation family further, deriving two alternative variants that are exact at the limit of infinite data, extending the theoretical analysis of the method, and demonstrating practical improvements in a range of experiments.

18.2LGMar 17
Simplex-to-Euclidean Bijection for Conjugate and Calibrated Multiclass Gaussian Process

Bernardo Williams, Harsha Vardhan Tetali, Arto Klami et al.

We propose a conjugate and calibrated Gaussian process (GP) model for multi-class classification by exploiting the geometry of the probability simplex. Our approach uses Aitchison geometry to map simplex-valued class probabilities to an unconstrained Euclidean representation, turning classification into a GP regression problem with fewer latent dimensions than standard multi-class GP classifiers. This yields conjugate inference and reliable predictive probabilities without relying on distributional approximations in the model construction. The method is compatible with standard sparse GP regression techniques, enabling scalable inference on larger datasets. Empirical results show well-calibrated and competitive performance across synthetic and real-world datasets.

LGOct 31, 2025
Simplex-to-Euclidean Bijections for Categorical Flow Matching

Bernardo Williams, Victor M. Yeom-Song, Marcelo Hartmann et al.

We propose a method for learning and sampling from probability distributions supported on the simplex. Our approach maps the open simplex to Euclidean space via smooth bijections, leveraging the Aitchison geometry to define the mappings, and supports modeling categorical data by a Dirichlet interpolation that dequantizes discrete observations into continuous ones. This enables density modeling in Euclidean space through the bijection while still allowing exact recovery of the original discrete distribution. Compared to previous methods that operate on the simplex using Riemannian geometry or custom noise processes, our approach works in Euclidean space while respecting the Aitchison geometry, and achieves competitive performance on both synthetic and real-world data sets.

LGOct 16, 2025Code
Geometric Moment Alignment for Domain Adaptation via Siegel Embeddings

Shayan Gharib, Marcelo Hartmann, Arto Klami

We address the problem of distribution shift in unsupervised domain adaptation with a moment-matching approach. Existing methods typically align low-order statistical moments of the source and target distributions in an embedding space using ad-hoc similarity measures. We propose a principled alternative that instead leverages the intrinsic geometry of these distributions by adopting a Riemannian distance for this alignment. Our key novelty lies in expressing the first- and second-order moments as a single symmetric positive definite (SPD) matrix through Siegel embeddings. This enables simultaneous adaptation of both moments using the natural geometric distance on the shared manifold of SPD matrices, preserving the mean and covariance structure of the source and target distributions and yielding a more faithful metric for cross-domain comparison. We connect the Riemannian manifold distance to the target-domain error bound, and validate the method on image denoising and image classification benchmarks. Our code is publicly available at https://github.com/shayangharib/GeoAdapt.

LGMay 30, 2025
Learning geometry and topology via multi-chart flows

Hanlin Yu, Søren Hauberg, Marcelo Hartmann et al.

Real world data often lie on low-dimensional Riemannian manifolds embedded in high-dimensional spaces. This motivates learning degenerate normalizing flows that map between the ambient space and a low-dimensional latent space. However, if the manifold has a non-trivial topology, it can never be correctly learned using a single flow. Instead multiple flows must be `glued together'. In this paper, we first propose the general training scheme for learning such a collection of flows, and secondly we develop the first numerical algorithms for computing geodesics on such manifolds. Empirically, we demonstrate that this leads to highly significant improvements in topology estimation.

OCJun 1, 2024
Non-geodesically-convex optimization in the Wasserstein space

Hoang Phuc Hau Luu, Hanlin Yu, Bernardo Williams et al.

We study a class of optimization problems in the Wasserstein space (the space of probability measures) where the objective function is nonconvex along generalized geodesics. Specifically, the objective exhibits some difference-of-convex structure along these geodesics. The setting also encompasses sampling problems where the logarithm of the target distribution is difference-of-convex. We derive multiple convergence insights for a novel semi Forward-Backward Euler scheme under several nonconvex (and possibly nonsmooth) regimes. Notably, the semi Forward-Backward Euler is just a slight modification of the Forward-Backward Euler whose convergence is -- to our knowledge -- still unknown in our very general non-geodesically-convex setting.

MEFeb 1, 2022
Lagrangian Manifold Monte Carlo on Monge Patches

Marcelo Hartmann, Mark Girolami, Arto Klami

The efficiency of Markov Chain Monte Carlo (MCMC) depends on how the underlying geometry of the problem is taken into account. For distributions with strongly varying curvature, Riemannian metrics help in efficient exploration of the target distribution. Unfortunately, they have significant computational overhead due to e.g. repeated inversion of the metric tensor, and current geometric MCMC methods using the Fisher information matrix to induce the manifold are in practice slow. We propose a new alternative Riemannian metric for MCMC, by embedding the target distribution into a higher-dimensional Euclidean space as a Monge patch and using the induced metric determined by direct geometric reasoning. Our metric only requires first-order gradient information and has fast inverse and determinants, and allows reducing the computational complexity of individual iterations from cubic to quadratic in the problem dimensionality. We demonstrate how Lagrangian Monte Carlo in this metric efficiently explores the target distributions.

MLOct 27, 2019
Prior Specification for Bayesian Matrix Factorization via Prior Predictive Matching

Eliezer de Souza da Silva, Tomasz Kuśmierczyk, Marcelo Hartmann et al.

The behavior of many Bayesian models used in machine learning critically depends on the choice of prior distributions, controlled by some hyperparameters that are typically selected by Bayesian optimization or cross-validation. This requires repeated, costly, posterior inference. We provide an alternative for selecting good priors without carrying out posterior inference, building on the prior predictive distribution that marginalizes out the model parameters. We estimate virtual statistics for data generated by the prior predictive distribution and then optimize over the hyperparameters to learn ones for which these virtual statistics match target values provided by the user or estimated from (subset of) the observed data. We apply the principle for probabilistic matrix factorization, for which good solutions for prior selection have been missing. We show that for Poisson factorization models we can analytically determine the hyperparameters, including the number of factors, that best replicate the target statistics, and we study empirically the sensitivity of the approach for model mismatch. We also present a model-independent procedure that determines the hyperparameters for general models by stochastic optimization, and demonstrate this extension in context of hierarchical matrix factorization models.