Alessandro Barp

LG
h-index6
14papers
452citations
Novelty50%
AI Score42

14 Papers

MLMar 20, 2022
Geometric Methods for Sampling, Optimisation, Inference and Adaptive Agents

Alessandro Barp, Lancelot Da Costa, Guilherme França et al.

In this chapter, we identify fundamental geometric structures that underlie the problems of sampling, optimisation, inference and adaptive decision-making. Based on this identification, we derive algorithms that exploit these geometric structures to solve these problems efficiently. We show that a wide range of geometric theories emerge naturally in these fields, ranging from measure-preserving processes, information divergences, Poisson geometry, and geometric integration. Specifically, we explain how (i) leveraging the symplectic geometry of Hamiltonian systems enable us to construct (accelerated) sampling and optimisation methods, (ii) the theory of Hilbertian subspaces and Stein operators provides a general methodology to obtain robust estimators, (iii) preserving the information geometry of decision-making yields adaptive agents that perform active inference. Throughout, we emphasise the rich connections between these fields; e.g., inference draws on sampling and optimisation, and adaptive decision-making assesses decisions by inferring their counterfactual consequences. Our exposition provides a conceptual overview of underlying ideas, rather than a technical discussion, which can be found in the references herein.

MLSep 26, 2022
Targeted Separation and Convergence with Kernel Discrepancies

Alessandro Barp, Carl-Johann Simon-Gabriel, Mark Girolami et al.

Maximum mean discrepancies (MMDs) like the kernel Stein discrepancy (KSD) have grown central to a wide range of applications, including hypothesis testing, sampler selection, distribution approximation, and variational inference. In each setting, these kernel-based discrepancy measures are required to (i) separate a target P from other probability measures or even (ii) control weak convergence to P. In this article we derive new sufficient and necessary conditions to ensure (i) and (ii). For MMDs on separable metric spaces, we characterize those kernels that separate Bochner embeddable measures and introduce simple conditions for separating all measures with unbounded kernels and for controlling convergence with bounded kernels. We use these results on $\mathbb{R}^d$ to substantially broaden the known conditions for KSD separation and convergence control and to develop the first KSDs known to exactly metrize weak convergence to P. Along the way, we highlight the implications of our results for hypothesis testing, measuring and improving sample quality, and sampling with Stein variational gradient descent.

MLNov 10, 2022
Controlling Moments with Kernel Stein Discrepancies

Heishiro Kanagawa, Alessandro Barp, Arthur Gretton et al.

Kernel Stein discrepancies (KSDs) measure the quality of a distributional approximation and can be computed even when the target density has an intractable normalizing constant. Notable applications include the diagnosis of approximate MCMC samplers and goodness-of-fit tests for unnormalized statistical models. The present work analyzes the convergence control properties of KSDs. We first show that standard KSDs used for weak convergence control fail to control moment convergence. To address this limitation, we next provide sufficient conditions under which alternative diffusion KSDs control both moment and weak convergence. As an immediate consequence we develop, for each $q > 0$, the first KSDs known to exactly characterize $q$-Wasserstein convergence.

MLAug 16, 2023
Warped geometric information on the optimisation of Euclidean functions

Marcelo Hartmann, Bernardo Williams, Hanlin Yu et al.

We consider the fundamental task of optimising a real-valued function defined in a potentially high-dimensional Euclidean space, such as the loss function in many machine-learning tasks or the logarithm of the probability distribution in statistical inference. We use Riemannian geometry notions to redefine the optimisation problem of a function on the Euclidean space to a Riemannian manifold with a warped metric, and then find the function's optimum along this manifold. The warped metric chosen for the search domain induces a computational friendly metric-tensor for which optimal search directions associated with geodesic curves on the manifold becomes easier to compute. Performing optimization along geodesics is known to be generally infeasible, yet we show that in this specific manifold we can analytically derive Taylor approximations up to third-order. In general these approximations to the geodesic curve will not lie on the manifold, however we construct suitable retraction maps to pull them back onto the manifold. Therefore, we can efficiently optimize along the approximate geodesic curves. We cover the related theory, describe a practical optimization algorithm and empirically evaluate it on a collection of challenging optimisation benchmarks. Our proposed algorithm, using 3rd-order approximation of geodesics, tends to outperform standard Euclidean gradient-based counterparts in term of number of iterations until convergence.

LGNov 29, 2023
Improving embedding of graphs with missing data by soft manifolds

Andrea Marinoni, Pietro Lio', Alessandro Barp et al.

Embedding graphs in continous spaces is a key factor in designing and developing algorithms for automatic information extraction to be applied in diverse tasks (e.g., learning, inferring, predicting). The reliability of graph embeddings directly depends on how much the geometry of the continuous space matches the graph structure. Manifolds are mathematical structure that can enable to incorporate in their topological spaces the graph characteristics, and in particular nodes distances. State-of-the-art of manifold-based graph embedding algorithms take advantage of the assumption that the projection on a tangential space of each point in the manifold (corresponding to a node in the graph) would locally resemble a Euclidean space. Although this condition helps in achieving efficient analytical solutions to the embedding problem, it does not represent an adequate set-up to work with modern real life graphs, that are characterized by weighted connections across nodes often computed over sparse datasets with missing records. In this work, we introduce a new class of manifold, named soft manifold, that can solve this situation. In particular, soft manifolds are mathematical structures with spherical symmetry where the tangent spaces to each point are hypocycloids whose shape is defined according to the velocity of information propagation across the data points. Using soft manifolds for graph embedding, we can provide continuous spaces to pursue any task in data analysis over complex datasets. Experimental results on reconstruction tasks on synthetic and real datasets show how the proposed approach enable more accurate and reliable characterization of graphs in continuous spaces with respect to the state-of-the-art.

LGApr 22, 2024
Deep Learning as Ricci Flow

Anthony Baptista, Alessandro Barp, Tapabrata Chakraborti et al.

Deep neural networks (DNNs) are powerful tools for approximating the distribution of complex data. It is known that data passing through a trained DNN classifier undergoes a series of geometric and topological simplifications. While some progress has been made toward understanding these transformations in neural networks with smooth activation functions, an understanding in the more general setting of non-smooth activation functions, such as the rectified linear unit (ReLU), which tend to perform better, is required. Here we propose that the geometric transformations performed by DNNs during classification tasks have parallels to those expected under Hamilton's Ricci flow - a tool from differential geometry that evolves a manifold by smoothing its curvature, in order to identify its topology. To illustrate this idea, we present a computational framework to quantify the geometric changes that occur as data passes through successive layers of a DNN, and use this framework to motivate a notion of `global Ricci network flow' that can be used to assess a DNN's ability to disentangle complex data geometries to solve classification problems. By training more than $1,500$ DNN classifiers of different widths and depths on synthetic and real-world data, we show that the strength of global Ricci network flow-like behaviour correlates with accuracy for well-trained DNNs, independently of depth, width and data set. Our findings motivate the use of tools from differential and discrete geometry to the problem of explainability in deep learning.

LGMar 31
Concept frustration: Aligning human concepts and machine representations

Enrico Parisini, Christopher J. Soelistyo, Ahab Isaac et al.

Aligning human-interpretable concepts with the internal representations learned by modern machine learning systems remains a central challenge for interpretable AI. We introduce a geometric framework for comparing supervised human concepts with unsupervised intermediate representations extracted from foundation model embeddings. Motivated by the role of conceptual leaps in scientific discovery, we formalise the notion of concept frustration: a contradiction that arises when an unobserved concept induces relationships between known concepts that cannot be made consistent within an existing ontology. We develop task-aligned similarity measures that detect concept frustration between supervised concept-based models and unsupervised representations derived from foundation models, and show that the phenomenon is detectable in task-aligned geometry while conventional Euclidean comparisons fail. Under a linear-Gaussian generative model we derive a closed-form expression for Bayes-optimal concept-based classifier accuracy, decomposing predictive signal into known-known, known-unknown and unknown-unknown contributions and identifying analytically where frustration affects performance. Experiments on synthetic data and real language and vision tasks demonstrate that frustration can be detected in foundation model representations and that incorporating a frustrating concept into an interpretable model reorganises the geometry of learned concept representations, to better align human and machine reasoning. These results suggest a principled framework for diagnosing incomplete concept ontologies and aligning human and machine conceptual reasoning, with implications for the development and validation of safe interpretable AI for high-risk applications.

STAT-MECHJul 23, 2021
Optimization on manifolds: A symplectic approach

Guilherme França, Alessandro Barp, Mark Girolami et al.

Optimization tasks are crucial in statistical machine learning. Recently, there has been great interest in leveraging tools from dynamical systems to derive accelerated and robust optimization methods via suitable discretizations of continuous-time systems. However, these ideas have mostly been limited to Euclidean spaces and unconstrained settings, or to Riemannian gradient flows. In this work, we propose a dissipative extension of Dirac's theory of constrained Hamiltonian systems as a general framework for solving optimization problems over smooth manifolds, including problems with nonlinear constraints. We develop geometric/symplectic numerical integrators on manifolds that are "rate-matching," i.e., preserve the continuous-time rates of convergence. In particular, we introduce a dissipative RATTLE integrator able to achieve optimal convergence rate locally. Our class of (accelerated) algorithms are not only simple and efficient but also applicable to a broad range of contexts.

PRMay 6, 2021
A Unifying and Canonical Description of Measure-Preserving Diffusions

Alessandro Barp, So Takao, Michael Betancourt et al.

A complete recipe of measure-preserving diffusions in Euclidean space was recently derived unifying several MCMC algorithms into a single framework. In this paper, we develop a geometric theory that improves and generalises this construction to any manifold. We thereby demonstrate that the completeness result is a direct consequence of the topology of the underlying manifold and the geometry induced by the target measure $P$; there is no need to introduce other structures such as a Riemannian metric, local coordinates, or a reference measure. Instead, our framework relies on the intrinsic geometry of $P$ and in particular its canonical derivative, the deRham rotationnel, which allows us to parametrise the Fokker--Planck currents of measure-preserving diffusions using potentials. The geometric formalism can easily incorporate constraints and symmetries, and deliver new important insights, for example, a new complete recipe of Langevin-like diffusions that are suited to the construction of samplers. We also analyse the reversibility and dissipative properties of the diffusions, the associated deterministic flow on the space of measures, and the geometry of Langevin processes. Our article connects ideas from various literature and frames the theory of measure-preserving diffusions in its appropriate mathematical context.

LGJun 16, 2020
Metrizing Weak Convergence with Maximum Mean Discrepancies

Carl-Johann Simon-Gabriel, Alessandro Barp, Bernhard Schölkopf et al.

This paper characterizes the maximum mean discrepancies (MMD) that metrize the weak convergence of probability measures for a wide class of kernels. More precisely, we prove that, on a locally compact, non-compact, Hausdorff space, the MMD of a bounded continuous Borel measurable kernel k, whose reproducing kernel Hilbert space (RKHS) functions vanish at infinity, metrizes the weak convergence of probability measures if and only if k is continuous and integrally strictly positive definite (i.s.p.d.) over all signed, finite, regular Borel measures. We also correct a prior result of Simon-Gabriel & Schölkopf (JMLR, 2018, Thm.12) by showing that there exist both bounded continuous i.s.p.d. kernels that do not metrize weak convergence and bounded continuous non-i.s.p.d. kernels that do metrize it.

STJun 19, 2019
Minimum Stein Discrepancy Estimators

Alessandro Barp, Francois-Xavier Briol, Andrew B. Duncan et al.

When maximum likelihood estimation is infeasible, one often turns to score matching, contrastive divergence, or minimum probability flow to obtain tractable parameter estimates. We provide a unifying perspective of these techniques as minimum Stein discrepancy estimators, and use this lens to design new diffusion kernel Stein discrepancy (DKSD) and diffusion score matching (DSM) estimators with complementary strengths. We establish the consistency, asymptotic normality, and robustness of DKSD and DSM estimators, then derive stochastic Riemannian gradient descent algorithms for their efficient optimisation. The main strength of our methodology is its flexibility, which allows us to design estimators with desirable properties for specific models at hand by carefully selecting a Stein discrepancy. We illustrate this advantage for several challenging problems for score matching, such as non-smooth, heavy-tailed or light-tailed densities.

MEJun 13, 2019
Statistical Inference for Generative Models with Maximum Mean Discrepancy

Francois-Xavier Briol, Alessandro Barp, Andrew B. Duncan et al.

While likelihood-based inference and its variants provide a statistically efficient and widely applicable approach to parametric inference, their application to models involving intractable likelihoods poses challenges. In this work, we study a class of minimum distance estimators for intractable generative models, that is, statistical models for which the likelihood is intractable, but simulation is cheap. The distance considered, maximum mean discrepancy (MMD), is defined through the embedding of probability measures into a reproducing kernel Hilbert space. We study the theoretical properties of these estimators, showing that they are consistent, asymptotically normal and robust to model misspecification. A main advantage of these estimators is the flexibility offered by the choice of kernel, which can be used to trade-off statistical efficiency and robustness. On the algorithmic side, we study the geometry induced by MMD on the parameter space and use this to introduce a novel natural gradient descent-like algorithm for efficient implementation of these estimators. We illustrate the relevance of our theoretical results on several classes of models including a discrete-time latent Markov process and two multivariate stochastic differential equation models.

COMay 9, 2019
Stein Point Markov Chain Monte Carlo

Wilson Ye Chen, Alessandro Barp, François-Xavier Briol et al.

An important task in machine learning and statistics is the approximation of a probability measure by an empirical measure supported on a discrete point set. Stein Points are a class of algorithms for this task, which proceed by sequentially minimising a Stein discrepancy between the empirical measure and the target and, hence, require the solution of a non-convex optimisation problem to obtain each new point. This paper removes the need to solve this optimisation problem by, instead, selecting each new point based on a Markov chain sample path. This significantly reduces the computational cost of Stein Points and leads to a suite of algorithms that are straightforward to implement. The new algorithms are illustrated on a set of challenging Bayesian inference problems, and rigorous theoretical guarantees of consistency are established.

COMay 8, 2017
Geometry and Dynamics for Markov Chain Monte Carlo

Alessandro Barp, Francois-Xavier Briol, Anthony D. Kennedy et al.

Markov Chain Monte Carlo methods have revolutionised mathematical computation and enabled statistical inference within many previously intractable models. In this context, Hamiltonian dynamics have been proposed as an efficient way of building chains which can explore probability densities efficiently. The method emerges from physics and geometry and these links have been extensively studied by a series of authors through the last thirty years. However, there is currently a gap between the intuitions and knowledge of users of the methodology and our deep understanding of these theoretical foundations. The aim of this review is to provide a comprehensive introduction to the geometric tools used in Hamiltonian Monte Carlo at a level accessible to statisticians, machine learners and other users of the methodology with only a basic understanding of Monte Carlo methods. This will be complemented with some discussion of the most recent advances in the field which we believe will become increasingly relevant to applied scientists.