Luc Rey-Bellet

h-index32

17papers

114citations

Novelty56%

AI Score29

Ranked #149,629 of 201,326 authors (top 74%)#2,211 in ML (top 63%)

17 Papers

MLOct 10, 2022

Function-space regularized Rényi divergences

Jeremiah Birrell, Yannis Pantazis, Paul Dupuis et al.

We propose a new family of regularized Rényi divergences parametrized not only by the order $α$ but also by a variational function space. These new objects are defined by taking the infimal convolution of the standard Rényi divergence with the integral probability metric (IPM) associated with the chosen function space. We derive a novel dual variational representation that can be used to construct numerically tractable divergence estimators. This representation avoids risk-sensitive terms and therefore exhibits lower variance, making it well-behaved when $α>1$; this addresses a notable weakness of prior approaches. We prove several properties of these new divergences, showing that they interpolate between the classical Rényi divergences and IPMs. We also study the $α\to\infty$ limit, which leads to a regularized worst-case-regret and a new variational representation in the classical case. Moreover, we show that the proposed regularized Rényi divergences inherit features from IPMs such as the ability to compare distributions that are not absolutely continuous, e.g., empirical measures and distributions with low-dimensional support. We present numerical results on both synthetic and real datasets, showing the utility of these new divergences in both estimation and GAN training applications; in particular, we demonstrate significantly reduced variance and improved training performance.

NAFeb 23, 2016

Efficient estimators for likelihood ratio sensitivity indices of complex stochastic dynamics

Georgios Arampatzis, Markos A. Katsoulakis, Luc Rey-Bellet

We demonstrate that centered likelihood ratio estimators for the sensitivity indices of complex stochastic dynamics are highly efficient with low, constant in time variance and consequently they are suitable for sensitivity analysis in long-time and steady-state regimes. These estimators rely on a new covariance formulation of the likelihood ratio that includes as a submatrix a Fisher Information Matrix for stochastic dynamics and can also be used for fast screening of insensitive parameters and parameter combinations. The proposed methods are applicable to broad classes of stochastic dynamics such as chemical reaction networks, Langevin-type equations and stochastic models in finance, including systems with a high dimensional parameter space and/or disparate decorrelation times between different observables. Furthermore, they are simple to implement as a standard observable in any existing simulation algorithms without additional modifications.

NAApr 15, 2013

Measuring the Irreversibility of Numerical Schemes for Reversible Stochastic Differential Equations

Markos Katsoulakis, Yannis Pantazis, Luc Rey-Bellet

For a Markov process the detailed balance condition is equivalent to the time-reversibility of the process. For stochastic differential equations (SDE's) time discretization numerical schemes usually destroy the property of time-reversibility. Despite an extensive literature on the numerical analysis for SDE's, their stability properties, strong and/or weak error estimates, large deviations and infinite-time estimates, no quantitative results are known on the lack of reversibility of the discrete-time approximation process. In this paper we provide such quantitative estimates by using the concept of entropy production rate, inspired by ideas from non-equilibrium statistical mechanics. The entropy production rate for a stochastic process is defined as the relative entropy (per unit time) of the path measure of the process with respect to the path measure of the time-reversed process. By construction the entropy production rate is nonnegative and it vanishes if and only if the process is reversible. Crucially, from a numerical point of view, the entropy production rate is an {\em a posteriori} quantity, hence it can be computed in the course of a simulation as the ergodic average of a certain functional of the process (the so-called Gallavotti-Cohen (GC) action functional). We compute the entropy production for various numerical schemes such as explicit Euler-Maruyama and explicit Milstein's for reversible SDEs with additive or multiplicative noise. Additionally, we analyze the entropy production for the BBK integrator of the Langevin processes. We show that entropy production is an observable that distinguishes between different numerical schemes in terms of their discretization-induced irreversibility. Furthermore, our results show that the type of the noise critically affects the behavior of the entropy production rate.

MLJul 16, 2024

Combining Wasserstein-1 and Wasserstein-2 proximals: robust manifold learning via well-posed generative flows

Hyemin Gu, Markos A. Katsoulakis, Luc Rey-Bellet et al.

We formulate well-posed continuous-time generative flows for learning distributions that are supported on low-dimensional manifolds through Wasserstein proximal regularizations of $f$-divergences. Wasserstein-1 proximal operators regularize $f$-divergences so that singular distributions can be compared. Meanwhile, Wasserstein-2 proximal operators regularize the paths of the generative flows by adding an optimal transport cost, i.e., a kinetic energy penalization. Via mean-field game theory, we show that the combination of the two proximals is critical for formulating well-posed generative flows. Generative flows can be analyzed through optimality conditions of a mean-field game (MFG), a system of a backward Hamilton-Jacobi (HJ) and a forward continuity partial differential equations (PDEs) whose solution characterizes the optimal generative flow. For learning distributions that are supported on low-dimensional manifolds, the MFG theory shows that the Wasserstein-1 proximal, which addresses the HJ terminal condition, and the Wasserstein-2 proximal, which addresses the HJ dynamics, are both necessary for the corresponding backward-forward PDE system to be well-defined and have a unique solution with provably linear flow trajectories. This implies that the corresponding generative flow is also unique and can therefore be learned in a robust manner even for learning high-dimensional distributions supported on low-dimensional manifolds. The generative flows are learned through adversarial training of continuous-time flows, which bypasses the need for reverse simulation. We demonstrate the efficacy of our approach for generating high-dimensional images without the need to resort to autoencoders or specialized architectures.

MLOct 31, 2022

Lipschitz-regularized gradient flows and generative particle algorithms for high-dimensional scarce data

Hyemin Gu, Panagiota Birmpa, Yannis Pantazis et al.

We build a new class of generative algorithms capable of efficiently learning an arbitrary target distribution from possibly scarce, high-dimensional data and subsequently generate new samples. These generative algorithms are particle-based and are constructed as gradient flows of Lipschitz-regularized Kullback-Leibler or other $f$-divergences, where data from a source distribution can be stably transported as particles, towards the vicinity of the target distribution. As a highlighted result in data integration, we demonstrate that the proposed algorithms correctly transport gene expression data points with dimension exceeding 54K, while the sample size is typically only in the hundreds.

NAOct 17, 2016

Information Criteria for quantifying loss of reversibility in parallelized KMC

Konstantinos Gourgoulias, Markos A. Katsoulakis, Luc Rey-Bellet

Parallel Kinetic Monte Carlo (KMC) is a potent tool to simulate stochastic particle systems efficiently. However, despite literature on quantifying domain decomposition errors of the particle system for this class of algorithms in the short and in the long time regime, no study yet explores and quantifies the loss of time-reversibility in Parallel KMC. Inspired by concepts from non-equilibrium statistical mechanics, we propose the entropy production per unit time, or entropy production rate, given in terms of an observable and a corresponding estimator, as a metric that quantifies the loss of reversibility. Typically, this is a quantity that cannot be computed explicitly for Parallel KMC, which is why we develop a posteriori estimators that have good scaling properties with respect to the size of the system. Through these estimators, we can connect the different parameters of the scheme, such as the communication time step of the parallelization, the choice of the domain decomposition, and the computational schedule, with its performance in controlling the loss of reversibility. From this point of view, the entropy production rate can be seen both as an information criterion to compare the reversibility of different parallel schemes and as a tool to diagnose reversibility issues with a particular scheme. As a demonstration, we use Sandia Lab's SPPARKS software to compare different parallelization schemes and different domain (lattice) decompositions.

GTJul 18, 2011

Decompositions of two player games: potential, zero-sum, and stable games

Sung-Ha Hwang, Luc Rey-Bellet

We introduce several methods of decomposition for two player normal form games. Viewing the set of all games as a vector space, we exhibit explicit orthonormal bases for the subspaces of potential games, zero-sum games, and their orthogonal complements which we call anti-potential games and anti-zero-sum games, respectively. Perhaps surprisingly, every anti-potential game comes either from the Rock-Paper-Scissors type games (in the case of symmetric games) or from the Matching Pennies type games (in the case of asymmetric games). Using these decompositions, we prove old (and some new) cycle criteria for potential and zero-sum games (as orthogonality relations between subspaces). We illustrate the usefulness of our decomposition by (a) analyzing the generalized Rock-Paper-Scissors game, (b) completely characterizing the set of all null-stable games, (c) providing a large class of strict stable games, (d) relating the game decomposition to the decomposition of vector fields for the replicator equations, (e) constructing Lyapunov functions for some replicator dynamics, and (f) constructing Zeeman games -games with an interior asymptotically stable Nash equilibrium and a pure strategy ESS.

NAOct 4, 2016

Information metrics for long-time errors in splitting schemes for stochastic dynamics and parallel KMC

Konstantinos Gourgoulias, Markos A. Katsoulakis, Luc Rey-Bellet

We propose an information-theoretic approach to analyze the long-time behavior of numerical splitting schemes for stochastic dynamics, focusing primarily on Parallel Kinetic Monte Carlo (KMC) algorithms.Established methods for numerical operator splittings provide error estimates in finite-time regimes, in terms of the order of the local error and the associated commutator. Path-space information-theoretic tools such as the relative entropy rate (RER) allow us to control long-time error through commutator calculations. Furthermore, they give rise to an a posteriori representation of the error which can thus be tracked in the course of a simulation. Another outcome of our analysis is the derivation of a path-space information criterion for comparison (and possibly design) of numerical schemes, in analogy to classical information criteria for model selection and discrimination. In the context of Parallel KMC, our analysis allows us to select schemes with improved numerical error and more efficient processor communication. We expect that such a path-space information perspective on numerical methods will be broadly applicable in stochastic dynamics, both for the finite and the long-time regime.

MLMay 24, 2024

Nonlinear denoising score matching for enhanced learning of structured distributions

Jeremiah Birrell, Markos A. Katsoulakis, Luc Rey-Bellet et al.

We present a novel method for training score-based generative models which uses nonlinear noising dynamics to improve learning of structured distributions. Generalizing to a nonlinear drift allows for additional structure to be incorporated into the dynamics, thus making the training better adapted to the data, e.g., in the case of multimodality or (approximate) symmetries. Such structure can be obtained from the data by an inexpensive preprocessing step. The nonlinear dynamics introduces new challenges into training which we address in two ways: 1) we develop a new nonlinear denoising score matching (NDSM) method, 2) we introduce neural control variates in order to reduce the variance of the NDSM training objective. We demonstrate the effectiveness of this method on several examples: a) a collection of low-dimensional examples, motivated by clustering in latent space, b) high-dimensional images, addressing issues with mode imbalance, small training sets, and approximate symmetries, the latter being a challenge for methods based on equivariant neural networks, which require exact symmetries, c) latent space representation of high-dimensional data, demonstrating improved performance with greatly reduced computational cost. Our method learns score-based generative models with less data by flexibly incorporating structure arising in the dataset.

MLMay 22, 2024

Robust Generative Learning with Lipschitz-Regularized $α$-Divergences Allows Minimal Assumptions on Target Distributions

Ziyu Chen, Hyemin Gu, Markos A. Katsoulakis et al.

This paper demonstrates the robustness of Lipschitz-regularized $α$-divergences as objective functionals in generative modeling, showing they enable stable learning across a wide range of target distributions with minimal assumptions. We establish that these divergences remain finite under a mild condition-that the source distribution has a finite first moment-regardless of the properties of the target distribution, making them adaptable to the structure of target distributions. Furthermore, we prove the existence and finiteness of their variational derivatives, which are essential for stable training of generative models such as GANs and gradient flows. For heavy-tailed targets, we derive necessary and sufficient conditions that connect data dimension, $α$, and tail behavior to divergence finiteness, that also provide insights into the selection of suitable $α$'s. We also provide the first sample complexity bounds for empirical estimations of these divergences on unbounded domains. As a byproduct, we obtain the first sample complexity bounds for empirical estimations of these divergences and the Wasserstein-1 metric with group symmetry on unbounded domains. Numerical experiments confirm that generative models leveraging Lipschitz-regularized $α$-divergences can stably learn distributions in various challenging scenarios, including those with heavy tails or complex, low-dimensional, or fractal support, all without any prior knowledge of the structure of target distributions.

MLMay 22, 2023

Statistical Guarantees of Group-Invariant GANs

Ziyu Chen, Markos A. Katsoulakis, Luc Rey-Bellet et al.

This work presents the first statistical performance guarantees for group-invariant generative models. Many real data, such as images and molecules, are invariant to certain group symmetries, which can be taken advantage of to learn more efficiently as we rigorously demonstrate in this work. Here we specifically study generative adversarial networks (GANs), and quantify the gains when incorporating symmetries into the model. Group-invariant GANs are a type of GANs in which the generators and discriminators are hardwired with group symmetries. Empirical studies have shown that these networks are capable of learning group-invariant distributions with significantly improved data efficiency. In this study, we aim to rigorously quantify this improvement by analyzing the reduction in sample complexity and in the discriminator approximation error for group-invariant GANs. Our findings indicate that when learning group-invariant distributions, the number of samples required for group-invariant GANs decreases proportionally by a factor of the group size and the discriminator approximation error has a reduced lower bound. Importantly, the overall error reduction cannot be achieved merely through data augmentation on the training data. Numerical results substantiate our theory and highlight the stark contrast between learning with group-invariant GANs and using data augmentation. This work also sheds light on the study of other generative models with group symmetries, such as score-based generative models.

LGFeb 2, 2022

Structure-preserving GANs

Jeremiah Birrell, Markos A. Katsoulakis, Luc Rey-Bellet et al.

Generative adversarial networks (GANs), a class of distribution-learning methods based on a two-player game between a generator and a discriminator, can generally be formulated as a minmax problem based on the variational representation of a divergence between the unknown and the generated distributions. We introduce structure-preserving GANs as a data-efficient framework for learning distributions with additional structure such as group symmetry, by developing new variational representations for divergences. Our theory shows that we can reduce the discriminator space to its projection on the invariant discriminator space, using the conditional expectation with respect to the sigma-algebra associated to the underlying structure. In addition, we prove that the discriminator space reduction must be accompanied by a careful design of structured generators, as flawed designs may easily lead to a catastrophic "mode collapse" of the learned distribution. We contextualize our framework by building symmetry-preserving GANs for distributions with intrinsic group symmetry, and demonstrate that both players, namely the equivariant generator and invariant discriminator, play important but distinct roles in the learning process. Empirical experiments and ablation studies across a broad range of data sets, including real-world medical imaging, validate our theory, and show our proposed methods achieve significantly improved sample fidelity and diversity -- almost an order of magnitude measured in Fréchet Inception Distance -- especially in the small data regime.

MLJul 17, 2021

Model Uncertainty and Correctability for Directed Graphical Models

Panagiota Birmpa, Jinchao Feng, Markos A. Katsoulakis et al.

Probabilistic graphical models are a fundamental tool in probabilistic modeling, machine learning and artificial intelligence. They allow us to integrate in a natural way expert knowledge, physical modeling, heterogeneous and correlated data and quantities of interest. For exactly this reason, multiple sources of model uncertainty are inherent within the modular structure of the graphical model. In this paper we develop information-theoretic, robust uncertainty quantification methods and non-parametric stress tests for directed graphical models to assess the effect and the propagation through the graph of multi-sourced model uncertainties to quantities of interest. These methods allow us to rank the different sources of uncertainty and correct the graphical model by targeting its most impactful components with respect to the quantities of interest. Thus, from a machine learning perspective, we provide a mathematically rigorous approach to correctability that guarantees a systematic selection for improvement of components of a graphical model while controlling potential new errors created in the process in other parts of the model. We demonstrate our methods in two physico-chemical examples, namely quantum scale-informed chemical kinetics and materials screening to improve the efficiency of fuel cells.

MLNov 11, 2020

$(f,Γ)$-Divergences: Interpolating between $f$-Divergences and Integral Probability Metrics

Jeremiah Birrell, Paul Dupuis, Markos A. Katsoulakis et al.

We develop a rigorous and general framework for constructing information-theoretic divergences that subsume both $f$-divergences and integral probability metrics (IPMs), such as the $1$-Wasserstein distance. We prove under which assumptions these divergences, hereafter referred to as $(f,Γ)$-divergences, provide a notion of `distance' between probability measures and show that they can be expressed as a two-stage mass-redistribution/mass-transport process. The $(f,Γ)$-divergences inherit features from IPMs, such as the ability to compare distributions which are not absolutely continuous, as well as from $f$-divergences, namely the strict concavity of their variational representations and the ability to control heavy-tailed distributions for particular choices of $f$. When combined, these features establish a divergence with improved properties for estimation, statistical learning, and uncertainty quantification applications. Using statistical learning as an example, we demonstrate their advantage in training generative adversarial networks (GANs) for heavy-tailed, not-absolutely continuous sample distributions. We also show improved performance and stability over gradient-penalized Wasserstein GAN in image generation.

MLJul 7, 2020

Variational Representations and Neural Network Estimation of Rényi Divergences

Jeremiah Birrell, Paul Dupuis, Markos A. Katsoulakis et al.

We derive a new variational formula for the Rényi family of divergences, $R_α(Q\|P)$, between probability measures $Q$ and $P$. Our result generalizes the classical Donsker-Varadhan variational formula for the Kullback-Leibler divergence. We further show that this Rényi variational formula holds over a range of function spaces; this leads to a formula for the optimizer under very weak assumptions and is also key in our development of a consistency theory for Rényi divergence estimators. By applying this theory to neural-network estimators, we show that if a neural network family satisfies one of several strengthened versions of the universal approximation property then the corresponding Rényi divergence estimator is consistent. In contrast to density-estimator based methods, our estimators involve only expectations under $Q$ and $P$ and hence are more effective in high dimensional systems. We illustrate this via several numerical examples of neural network estimation in systems of up to 5000 dimensions.

NASep 9, 2016

Uncertainty quantification for generalized Langevin dynamics

Eric Joseph Hall, Markos A. Katsoulakis, Luc Rey-Bellet

We present efficient finite difference estimators for goal-oriented sensitivity indices with applications to the generalized Langevin equation (GLE). In particular, we apply these estimators to analyze an extended variable formulation of the GLE where other well known sensitivity analysis techniques such as the likelihood ratio method are not applicable to key parameters of interest. These easily implemented estimators are formed by coupling the nominal and perturbed dynamics appearing in the finite difference through a common driving noise, or common random path. After developing a general framework for variance reduction via coupling, we demonstrate the optimality of the common random path coupling in the sense that it produces a minimal variance surrogate for the difference estimator relative to sampling dynamics driven by independent paths. In order to build intuition for the common random path coupling, we evaluate the efficiency of the proposed estimators for a comprehensive set of examples of interest in particle dynamics. These reduced variance difference estimators are also a useful tool for performing global sensitivity analysis and for investigating non-local perturbations of parameters, such as increasing the number of Prony modes active in an extended variable GLE.

NAAug 1, 2006

Coarse-graining schemes and a posteriori error estimates for stochastic lattice systems

Markos A. Katsoulakis, Petr Plechac, Luc Rey-Bellet et al.

The primary objective of this work is to develop coarse-graining schemes for stochastic many-body microscopic models and quantify their effectiveness in terms of a priori and a posteriori error analysis. In this paper we focus on stochastic lattice systems of interacting particles at equilibrium. %such as Ising-type models. The proposed algorithms are derived from an initial coarse-grained approximation that is directly computable by Monte Carlo simulations, and the corresponding numerical error is calculated using the specific relative entropy between the exact and approximate coarse-grained equilibrium measures. Subsequently we carry out a cluster expansion around this first-and often inadequate-approximation and obtain more accurate coarse-graining schemes. The cluster expansions yield also sharp a posteriori error estimates for the coarse-grained approximations that can be used for the construction of adaptive coarse-graining methods. We present a number of numerical examples that demonstrate that the coarse-graining schemes developed here allow for accurate predictions of critical behavior and hysteresis in systems with intermediate and long-range interactions. We also present examples where they substantially improve predictions of earlier coarse-graining schemes for short-range interactions.