63.0LGMay 28
Diffusion Models Preferentially Memorize Prototypical Examples or: Why Does My Diffusion Model Love Slop?Marta Aparicio Rodriguez, Anastasia Borovykh, Grigorios A. Pavliotis et al.
Generative models have a persistent limitation: their tendency to memorize training data can create legal liabilities and erode creative diversity. Understanding which samples are memorized in whole or in part, and under what conditions, therefore remains an important open problem. Here we answer the question "Are atypical or rare samples memorized first?" in the negative. We train diffusion models on strings generated according to the production rules of the Random Hierarchy Model (RHM), and find that samples composed of common substrings are preferentially memorized. This holds true even if the training data consists of entirely unique samples, indicating that deduplication at the data point level does not provide a meaningful privacy guarantee. Correspondingly we predict, then observe, delayed memorization for fat-tailed datasets (i.e., those with more atypical samples). This effect is amplified when fat-tails are introduced into high-level production rules. These together suggest that dataset diversity, particularly at higher levels of abstraction, plays an important role in staving off memorization. Finally, we identify an intermediate regime of partial memorization in which common substrings are learned first and subsequently overproduced during generation. If training is stopped in this regime, models will exhibit the reversion-to-the-mean blandness often derided as "slop".
NADec 4, 2012
Optimal non-reversible linear drift for the convergence to equilibrium of a diffusionTony Lelièvre, Francis Nier, Grigorios A. Pavliotis
We consider non-reversible perturbations of reversible diffusions that do not alter the invariant distribution and we ask whether there exists an optimal perturbation such that the rate of convergence to equilibrium is maximized. We solve this problem for the case of linear drift by proving the existence of such optimal perturbations and by providing an easily implementable algorithm for constructing them. We discuss in particular the role of the prefactor in the exponential convergence estimate. Our rigorous results are illustrated by numerical experiments.
OCSep 27, 2022
Neural parameter calibration for large-scale multi-agent modelsThomas Gaskin, Grigorios A. Pavliotis, Mark Girolami
Computational models have become a powerful tool in the quantitative sciences to understand the behaviour of complex systems that evolve in time. However, they often contain a potentially large number of free parameters whose values cannot be obtained from theory but need to be inferred from data. This is especially the case for models in the social sciences, economics, or computational epidemiology. Yet many current parameter estimation methods are mathematically involved and computationally slow to run. In this paper we present a computationally simple and fast method to retrieve accurate probability densities for model parameters using neural differential equations. We present a pipeline comprising multi-agent models acting as forward solvers for systems of ordinary or stochastic differential equations, and a neural network to then extract parameters from the data generated by the model. The two combined create a powerful tool that can quickly estimate densities on model parameters, even for very large systems. We demonstrate the method on synthetic time series data of the SIR model of the spread of infection, and perform an in-depth analysis of the Harris-Wilson model of economic activity on a network, representing a non-convex problem. For the latter, we apply our method both to synthetic data and to data of economic activity across Greater London. We find that our method calibrates the model orders of magnitude more accurately than a previous study of the same dataset using classical techniques, while running between 195 and 390 times faster.
MLMar 20, 2022
Geometric Methods for Sampling, Optimisation, Inference and Adaptive AgentsAlessandro Barp, Lancelot Da Costa, Guilherme França et al.
In this chapter, we identify fundamental geometric structures that underlie the problems of sampling, optimisation, inference and adaptive decision-making. Based on this identification, we derive algorithms that exploit these geometric structures to solve these problems efficiently. We show that a wide range of geometric theories emerge naturally in these fields, ranging from measure-preserving processes, information divergences, Poisson geometry, and geometric integration. Specifically, we explain how (i) leveraging the symplectic geometry of Hamiltonian systems enable us to construct (accelerated) sampling and optimisation methods, (ii) the theory of Hilbertian subspaces and Stein operators provides a general methodology to obtain robust estimators, (iii) preserving the information geometry of decision-making yields adaptive agents that perform active inference. Throughout, we emphasise the rich connections between these fields; e.g., inference draws on sampling and optimisation, and adaptive decision-making assesses decisions by inferring their counterfactual consequences. Our exposition provides a conceptual overview of underlying ideas, rather than a technical discussion, which can be found in the references herein.
LGMar 30, 2023
Inferring networks from time series: a neural approachThomas Gaskin, Grigorios A. Pavliotis, Mark Girolami
Network structures underlie the dynamics of many complex phenomena, from gene regulation and foodwebs to power grids and social media. Yet, as they often cannot be observed directly, their connectivities must be inferred from observations of the dynamics to which they give rise. In this work we present a powerful computational method to infer large network adjacency matrices from time series data using a neural network, in order to provide uncertainty quantification on the prediction in a manner that reflects both the degree to which the inference problem is underdetermined as well as the noise on the data. This is a feature that other approaches have hitherto been lacking. We demonstrate our method's capabilities by inferring line failure locations in the British power grid from its response to a power cut, providing probability densities on each edge and allowing the use of hypothesis testing to make meaningful probabilistic statements about the location of the cut. Our method is significantly more accurate than both Markov-chain Monte Carlo sampling and least squares regression on noisy data and when the problem is underdetermined, while naturally extending to the case of non-linear dynamics, which we demonstrate by learning an entire cost matrix for a non-linear model of economic activity in Greater London. Not having been specifically engineered for network inference, this method in fact represents a general parameter estimation scheme that is applicable to any high-dimensional parameter space.
50.9OCApr 30
Linearization-Based Feedback Stabilization of McKean-Vlasov PDEsDante Kalise, Lucas M. Moschen, Grigorios A. Pavliotis
We develop a feedback control framework for stabilizing the McKean-Vlasov PDE on the torus. Our goal is to steer the dynamics toward a prescribed stationary distribution or accelerate convergence to it using a time-dependent control potential. We reformulate the controlled PDE in a weighted, zero-mean space and apply the ground-state transform to obtain a Schrodinger-type operator. The resulting operator framework enables spectral analysis, verification of the infinite-dimensional Hautus test, and construction of a Riccati-based feedback law derived from the linearized dynamics, yielding local exponential stabilization with a chosen convergence rate. We rigorously prove local exponential stabilization via maximal regularity arguments and nonlinear estimates. Numerical experiments on well-studied models in one and two dimensions (the noisy Kuramoto model for synchronization, the O(2) spin model in a magnetic field, and the von Mises attractive interaction potential) showcase the effectiveness of our control strategy, demonstrating convergence acceleration and stabilization of unstable equilibria.
DSOct 29, 2023
Machine Learning for the identification of phase-transitions in interacting agent-based systems: a Desai-Zwanzig exampleNikolaos Evangelou, Dimitrios G. Giovanis, George A. Kevrekidis et al.
Deriving closed-form, analytical expressions for reduced-order models, and judiciously choosing the closures leading to them, has long been the strategy of choice for studying phase- and noise-induced transitions for agent-based models (ABMs). In this paper, we propose a data-driven framework that pinpoints phase transitions for an ABM- the Desai-Zwanzig model in its mean-field limit, using a smaller number of variables than traditional closed-form models. To this end, we use the manifold learning algorithm Diffusion Maps to identify a parsimonious set of data-driven latent variables, and show that they are in one-to-one correspondence with the expected theoretical order parameter of the ABM. We then utilize a deep learning framework to obtain a conformal reparametrization of the data-driven coordinates that facilitates, in our example, the identification of a single parameter-dependent ODE in these coordinates. We identify this ODE through a residual neural network inspired by a numerical integration scheme (forward Euler). We then use the identified ODE - enabled through an odd symmetry transformation - to construct the bifurcation diagram exhibiting the phase transition.
LGDec 5, 2023
Neural parameter calibration and uncertainty quantification for epidemic forecastingThomas Gaskin, Tim Conrad, Grigorios A. Pavliotis et al.
The recent COVID-19 pandemic has thrown the importance of accurately forecasting contagion dynamics and learning infection parameters into sharp focus. At the same time, effective policy-making requires knowledge of the uncertainty on such predictions, in order, for instance, to be able to ready hospitals and intensive care units for a worst-case scenario without needlessly wasting resources. In this work, we apply a novel and powerful computational method to the problem of learning probability densities on contagion parameters and providing uncertainty quantification for pandemic projections. Using a neural network, we calibrate an ODE model to data of the spread of COVID-19 in Berlin in 2020, achieving both a significantly more accurate calibration and prediction than Markov-Chain Monte Carlo (MCMC)-based sampling schemes. The uncertainties on our predictions provide meaningful confidence intervals e.g. on infection figures and hospitalisation rates, while training and running the neural scheme takes minutes where MCMC takes hours. We show convergence of our method to the true posterior on a simplified SIR model of epidemics, and also demonstrate our method's learning capabilities on a reduced dataset, where a complex model is learned from a small number of compartments for which data is available.
LGOct 28, 2025
Identifiable learning of dissipative dynamicsAiqing Zhu, Beatrice W. Soh, Grigorios A. Pavliotis et al.
Complex dissipative systems appear across science and engineering, from polymers and active matter to learning algorithms. These systems operate far from equilibrium, where energy dissipation and time irreversibility are key to their behavior, but are difficult to quantify from data. Learning accurate and interpretable models of such dynamics remains a major challenge: the models must be expressive enough to describe diverse processes, yet constrained enough to remain physically meaningful and mathematically identifiable. Here, we introduce I-OnsagerNet, a neural framework that learns dissipative stochastic dynamics directly from trajectories while ensuring both interpretability and uniqueness. I-OnsagerNet extends the Onsager principle to guarantee that the learned potential is obtained from the stationary density and that the drift decomposes cleanly into time-reversible and time-irreversible components, as dictated by the Helmholtz decomposition. Our approach enables us to calculate the entropy production and to quantify irreversibility, offering a principled way to detect and quantify deviations from equilibrium. Applications to polymer stretching in elongational flow and to stochastic gradient Langevin dynamics reveal new insights, including super-linear scaling of barrier heights and sub-linear scaling of entropy production rates with the strain rate, and the suppression of irreversibility with increasing batch size. I-OnsagerNet thus establishes a general, data-driven framework for discovering and interpreting non-equilibrium dynamics.
OCJul 15, 2020
On stochastic mirror descent with interacting particles: convergence properties and variance reductionAnastasia Borovykh, Nikolas Kantas, Panos Parpas et al.
An open problem in optimization with noisy information is the computation of an exact minimizer that is independent of the amount of noise. A standard practice in stochastic approximation algorithms is to use a decreasing step-size. This however leads to a slower convergence. A second alternative is to use a fixed step-size and run independent replicas of the algorithm and average these. A third option is to run replicas of the algorithm and allow them to interact. It is unclear which of these options works best. To address this question, we reduce the problem of the computation of an exact minimizer with noisy gradient information to the study of stochastic mirror descent with interacting particles. We study the convergence of stochastic mirror descent and make explicit the tradeoffs between communication and variance reduction. We provide theoretical and numerical evidence to suggest that interaction helps to improve convergence and reduce the variance of the estimate.
MLMay 10, 2019
The sharp, the flat and the shallow: Can weakly interacting agents learn to escape bad minima?Nikolas Kantas, Panos Parpas, Grigorios A. Pavliotis
An open problem in machine learning is whether flat minima generalize better and how to compute such minima efficiently. This is a very challenging problem. As a first step towards understanding this question we formalize it as an optimization problem with weakly interacting agents. We review appropriate background material from the theory of stochastic processes and provide insights that are relevant to practitioners. We propose an algorithmic framework for an extended stochastic gradient Langevin dynamics and illustrate its potential. The paper is written as a tutorial, and presents an alternative use of multi-agent learning. Our primary focus is on the design of algorithms for machine learning applications; however the underlying mathematical framework is suitable for the understanding of large scale systems of agent based models that are popular in the social sciences, economics and finance.
NAApr 21, 2019
Accelerated convergence to equilibrium and reduced asymptotic variance for Langevin dynamics using Stratonovich perturbationsAssyr Abdulle, Grigorios A. Pavliotis, Gilles Vilmart
In this paper we propose a new approach for sampling from probability measures in, possibly, high dimensional spaces. By perturbing the standard overdamped Langevin dynamics by a suitable Stratonovich perturbation that preserves the invariant measure of the original system, we show that accelerated convergence to equilibrium and reduced asymptotic variance can be achieved, leading, thus, to a computationally advantageous sampling algorithm. The new perturbed Langevin dynamics is reversible with respect to the target probability measure and, consequently, does not suffer from the drawbacks of the nonreversible Langevin samplers that were introduced in~[C.-R. Hwang, S.-Y. Hwang-Ma, and S.-J. Sheu, Ann. Appl. Probab. 1993] and studied in, e.g. [T. Lelievre, F. Nier, and G. A. Pavliotis J. Stat. Phys., 2013] and [A. B. Duncan, T. Lelièvre, and G. A. Pavliotis J. Stat. Phys., 2016], while retaining all of their advantages in terms of accelerated convergence and reduced asymptotic variance. In particular, the reversibility of the dynamics ensures that there is no oscillatory transient behaviour. The improved performance of the proposed methodology, in comparison to the standard overdamped Langevin dynamics and its nonreversible perturbation, is illustrated on an example of sampling from a two-dimensional warped Gaussian target distribution.