HEP-LATSep 29, 2023
Diffusion Models as Stochastic Quantization in Lattice Field TheoryLingxiao Wang, Gert Aarts, Kai Zhou
In this work, we establish a direct connection between generative diffusion models (DMs) and stochastic quantization (SQ). The DM is realized by approximating the reversal of a stochastic process dictated by the Langevin equation, generating samples from a prior distribution to effectively mimic the target distribution. Using numerical simulations, we demonstrate that the DM can serve as a global sampler for generating quantum lattice field configurations in two-dimensional $φ^4$ theory. We demonstrate that DMs can notably reduce autocorrelation times in the Markov chain, especially in the critical region where standard Markov Chain Monte-Carlo (MCMC) algorithms experience critical slowing down. The findings can potentially inspire further advancements in lattice field theory simulations, in particular in cases where it is expensive to generate large ensembles.
HEP-LATNov 6, 2023
Generative Diffusion Models for Lattice Field TheoryLingxiao Wang, Gert Aarts, Kai Zhou
This study delves into the connection between machine learning and lattice field theory by linking generative diffusion models (DMs) with stochastic quantization, from a stochastic differential equation perspective. We show that DMs can be conceptualized by reversing a stochastic process driven by the Langevin equation, which then produces samples from an initial distribution to approximate the target distribution. In a toy model, we highlight the capability of DMs to learn effective actions. Furthermore, we demonstrate its feasibility to act as a global sampler for generating configurations in the two-dimensional $φ^4$ quantum lattice field theory.
DIS-NNJul 23, 2024
Stochastic weight matrix dynamics during learning and Dyson Brownian motionGert Aarts, Biagio Lucini, Chanju Park
We demonstrate that the update of weight matrices in learning algorithms can be described in the framework of Dyson Brownian motion, thereby inheriting many features of random matrix theory. We relate the level of stochasticity to the ratio of the learning rate and the mini-batch size, providing more robust evidence to a previously conjectured scaling relationship. We discuss universal and non-universal features in the resulting Coulomb gas distribution and identify the Wigner surmise and Wigner semicircle explicitly in a teacher-student model and in the (near-)solvable case of the Gaussian restricted Boltzmann machine.
HEP-LATJan 27
Generalizable Equivariant Diffusion Models for Non-Abelian Lattice Gauge TheoryGert Aarts, Diaa E. Habibi, Andreas Ipp et al.
We demonstrate that gauge equivariant diffusion models can accurately model the physics of non-Abelian lattice gauge theory using the Metropolis-adjusted annealed Langevin algorithm (MAALA), as exemplified by computations in two-dimensional U(2) and SU(2) gauge theories. Our network architecture is based on lattice gauge equivariant convolutional neural networks (L-CNNs), which respect local and global symmetries on the lattice. Models are trained on a single ensemble generated using a traditional Monte Carlo method. By studying Wilson loops of various size as well as the topological susceptibility, we find that the diffusion approach generalizes remarkably well to larger inverse couplings and lattice sizes with negligible loss of accuracy while retaining moderately high acceptance rates.
HEP-LATJan 9, 2025
Physics-Driven Learning for Inverse Problems in Quantum ChromodynamicsGert Aarts, Kenji Fukushima, Tetsuo Hatsuda et al.
The integration of deep learning techniques and physics-driven designs is reforming the way we address inverse problems, in which accurate physical properties are extracted from complex data sets. This is particularly relevant for quantum chromodynamics (QCD), the theory of strong interactions, with its inherent limitations in observational data and demanding computational approaches. This perspective highlights advances and potential of physics-driven learning methods, focusing on predictions of physical quantities towards QCD physics, and drawing connections to machine learning(ML). It is shown that the fusion of ML and physics can lead to more efficient and reliable problem-solving strategies. Key ideas of ML, methodology of embedding physics priors, and generative models as inverse modelling of physical probability distributions are introduced. Specific applications cover first-principle lattice calculations, and QCD physics of hadrons, neutron stars, and heavy-ion collisions. These examples provide a structured and concise overview of how incorporating prior knowledge such as symmetry, continuity and equations into deep learning designs can address diverse inverse problems across different physical sciences.
HEP-LATOct 28, 2024
On learning higher-order cumulants in diffusion modelsGert Aarts, Diaa E. Habibi, Lingxiao Wang et al.
To analyse how diffusion models learn correlations beyond Gaussian ones, we study the behaviour of higher-order cumulants, or connected n-point functions, under both the forward and backward process. We derive explicit expressions for the moment- and cumulant-generating functionals, in terms of the distribution of the initial data and properties of forward process. It is shown analytically that during the forward process higher-order cumulants are conserved in models without a drift, such as the variance-expanding scheme, and that therefore the endpoint of the forward process maintains nontrivial correlations. We demonstrate that since these correlations are encoded in the score function, higher-order cumulants are learnt in the backward process, also when starting from a normal prior. We confirm our analytical results in an exactly solvable toy model with nonzero cumulants and in scalar lattice field theory.
HEP-LATFeb 8, 2025
Physics-Conditioned Diffusion Models for Lattice Gauge TheoryQianteng Zhu, Gert Aarts, Wei Wang et al.
We develop diffusion models for simulating lattice gauge theories, where stochastic quantization is explicitly incorporated as a physical condition for sampling. We demonstrate the applicability of this novel sampler to U(1) gauge theory in two spacetime dimensions and find that a model trained at a small inverse coupling constant can be extrapolated to larger inverse coupling regions without encountering the topological freezing problem. Additionally, the trained model can be employed to sample configurations on different lattice sizes without requiring further training. The exactness of the generated samples is ensured by incorporating Metropolis-adjusted Langevin dynamics into the generation process. Furthermore, we demonstrate that this approach enables more efficient sampling of topological quantities compared to traditional algorithms such as Hybrid Monte Carlo and Langevin simulations.
HEP-LATDec 2, 2024
Diffusion models learn distributions generated by complex Langevin dynamicsDiaa E. Habibi, Gert Aarts, Lingxiao Wang et al.
The probability distribution effectively sampled by a complex Langevin process for theories with a sign problem is not known a priori and notoriously hard to understand. Diffusion models, a class of generative AI, can learn distributions from data. In this contribution, we explore the ability of diffusion models to learn the distributions created by a complex Langevin process.
DIS-NNNov 20, 2024
Dyson Brownian motion and random matrix dynamics of weight matrices during learningGert Aarts, Ouraman Hajizadeh, Biagio Lucini et al.
During training, weight matrices in machine learning architectures are updated using stochastic gradient descent or variations thereof. In this contribution we employ concepts of random matrix theory to analyse the resulting stochastic matrix dynamics. We first demonstrate that the dynamics can generically be described using Dyson Brownian motion, leading to e.g. eigenvalue repulsion. The level of stochasticity is shown to depend on the ratio of the learning rate and the mini-batch size, explaining the empirically observed linear scaling rule. We verify this linear scaling in the restricted Boltzmann machine. Subsequently we study weight matrix dynamics in transformers (a nano-GPT), following the evolution from a Marchenko-Pastur distribution for eigenvalues at initialisation to a combination with additional structure at the end of learning.
HEP-LATOct 1, 2025
Combining complex Langevin dynamics with score-based and energy-based diffusion modelsGert Aarts, Diaa E. Habibi, Lingxiao Wang et al.
Theories with a sign problem due to a complex action or Boltzmann weight can sometimes be numerically solved using a stochastic process in the complexified configuration space. However, the probability distribution effectively sampled by this complex Langevin process is not known a priori and notoriously hard to understand. In generative AI, diffusion models can learn distributions, or their log derivatives, from data. We explore the ability of diffusion models to learn the distributions sampled by a complex Langevin process, comparing score-based and energy-based diffusion models, and speculate about possible applications.
DIS-NNSep 1, 2025
Phase diagram and eigenvalue dynamics of stochastic gradient descent in multilayer neural networksChanju Park, Biagio Lucini, Gert Aarts
Hyperparameter tuning is one of the essential steps to guarantee the convergence of machine learning models. We argue that intuition about the optimal choice of hyperparameters for stochastic gradient descent can be obtained by studying a neural network's phase diagram, in which each phase is characterised by distinctive dynamics of the singular values of weight matrices. Taking inspiration from disordered systems, we start from the observation that the loss landscape of a multilayer neural network with mean squared error can be interpreted as a disordered system in feature space, where the learnt features are mapped to soft spin degrees of freedom, the initial variance of the weight matrices is interpreted as the strength of the disorder, and temperature is given by the ratio of the learning rate and the batch size. As the model is trained, three phases can be identified, in which the dynamics of weight matrices is qualitatively different. Employing a Langevin equation for stochastic gradient descent, previously derived using Dyson Brownian motion, we demonstrate that the three dynamical regimes can be classified effectively, providing practical guidance for the choice of hyperparameters of the optimiser.
IMMar 18, 2025
Strategic White Paper on AI Infrastructure for Particle, Nuclear, and Astroparticle Physics: Insights from JENA and EuCAIFSascha Caron, Andreas Ipp, Gert Aarts et al.
Artificial intelligence (AI) is transforming scientific research, with deep learning methods playing a central role in data analysis, simulations, and signal detection across particle, nuclear, and astroparticle physics. Within the JENA communities-ECFA, NuPECC, and APPEC-and as part of the EuCAIF initiative, AI integration is advancing steadily. However, broader adoption remains constrained by challenges such as limited computational resources, a lack of expertise, and difficulties in transitioning from research and development (R&D) to production. This white paper provides a strategic roadmap, informed by a community survey, to address these barriers. It outlines critical infrastructure requirements, prioritizes training initiatives, and proposes funding strategies to scale AI capabilities across fundamental physics over the next five years.
HEP-LATDec 29, 2024
Random Matrix Theory for Stochastic Gradient DescentChanju Park, Matteo Favoni, Biagio Lucini et al.
Investigating the dynamics of learning in machine learning algorithms is of paramount importance for understanding how and why an approach may be successful. The tools of physics and statistics provide a robust setting for such investigations. Here we apply concepts from random matrix theory to describe stochastic weight matrix dynamics, using the framework of Dyson Brownian motion. We derive the linear scaling rule between the learning rate (step size) and the batch size, and identify universal and non-universal aspects of weight matrix dynamics. We test our findings in the (near-)solvable case of the Gaussian Restricted Boltzmann Machine and in a linear one-hidden-layer neural network.
HEP-LATFeb 10, 2022
Applications of Machine Learning to Lattice Quantum Field TheoryDenis Boyda, Salvatore Calì, Sam Foreman et al.
There is great potential to apply machine learning in the area of numerical lattice quantum field theory, but full exploitation of that potential will require new strategies. In this white paper for the Snowmass community planning process, we discuss the unique requirements of machine learning for lattice quantum field theory research and outline what is needed to enable exploration and deployment of this approach in the future.
AIDec 29, 2021
Towards a Shapley Value Graph Framework for Medical peer-influenceJamie Duell, Monika Seisenberger, Gert Aarts et al.
eXplainable Artificial Intelligence (XAI) is a sub-field of Artificial Intelligence (AI) that is at the forefront of AI research. In XAI, feature attribution methods produce explanations in the form of feature importance. People often use feature importance as guidance for intervention. However, a limitation of existing feature attribution methods is that there is a lack of explanation towards the consequence of intervention. In other words, although contribution towards a certain prediction is highlighted by feature attribution methods, the relation between features and the consequence of intervention is not studied. The aim of this paper is to introduce a new framework, called a peer influence framework to look deeper into explanations using graph representation for feature-to-feature interactions to improve the interpretability of black-box Machine Learning models and inform intervention.
LGOct 21, 2021
Quantum field theories, Markov random fields and machine learningDimitrios Bachtis, Gert Aarts, Biagio Lucini
The transition to Euclidean space and the discretization of quantum field theories on spatial or space-time lattices opens up the opportunity to investigate probabilistic machine learning within quantum field theory. Here, we will discuss how discretized Euclidean field theories, such as the $φ^{4}$ lattice field theory on a square lattice, are mathematically equivalent to Markov fields, a notable class of probabilistic graphical models with applications in a variety of research areas, including machine learning. The results are established based on the Hammersley-Clifford theorem. We will then derive neural networks from quantum field theories and discuss applications pertinent to the minimization of the Kullback-Leibler divergence for the probability distribution of the $φ^{4}$ machine learning algorithms and other probability distributions.
LGSep 16, 2021
Machine learning with quantum field theoriesDimitrios Bachtis, Gert Aarts, Biagio Lucini
The precise equivalence between discretized Euclidean field theories and a certain class of probabilistic graphical models, namely the mathematical framework of Markov random fields, opens up the opportunity to investigate machine learning from the perspective of quantum field theory. In this contribution we will demonstrate, through the Hammersley-Clifford theorem, that the $φ^{4}$ scalar field theory on a square lattice satisfies the local Markov property and can therefore be recast as a Markov random field. We will then derive from the $φ^{4}$ theory machine learning algorithms and neural networks which can be viewed as generalizations of conventional neural network architectures. Finally, we will conclude by presenting applications based on the minimization of an asymmetric distance between the probability distribution of the $φ^{4}$ machine learning algorithms and target probability distributions.
HEP-LATFeb 18, 2021
Quantum field-theoretic machine learningDimitrios Bachtis, Gert Aarts, Biagio Lucini
We derive machine learning algorithms from discretized Euclidean field theories, making inference and learning possible within dynamics described by quantum field theory. Specifically, we demonstrate that the $φ^{4}$ scalar field theory satisfies the Hammersley-Clifford theorem, therefore recasting it as a machine learning algorithm within the mathematically rigorous framework of Markov random fields. We illustrate the concepts by minimizing an asymmetric distance between the probability distribution of the $φ^{4}$ theory and that of target distributions, by quantifying the overlap of statistical ensembles between probability distributions and through reweighting to complex-valued actions with longer-range interactions. Neural network architectures are additionally derived from the $φ^{4}$ theory which can be viewed as generalizations of conventional neural networks and applications are presented. We conclude by discussing how the proposal opens up a new research avenue, that of developing a mathematical and computational framework of machine learning within quantum field theory.
HEP-LATSep 30, 2020
Adding machine learning within Hamiltonians: Renormalization group transformations, symmetry breaking and restorationDimitrios Bachtis, Gert Aarts, Biagio Lucini
We present a physical interpretation of machine learning functions, opening up the possibility to control properties of statistical systems via the inclusion of these functions in Hamiltonians. In particular, we include the predictive function of a neural network, designed for phase classification, as a conjugate variable coupled to an external field within the Hamiltonian of a system. Results in the two-dimensional Ising model evidence that the field can induce an order-disorder phase transition by breaking or restoring the symmetry, in contrast with the field of the conventional order parameter which causes explicit symmetry breaking. The critical behavior is then studied by proposing a Hamiltonian-agnostic reweighting approach and forming a renormalization group mapping on quantities derived from the neural network. Accurate estimates of the critical point and of the critical exponents related to the operators that govern the divergence of the correlation length are provided. We conclude by discussing how the method provides an essential step toward bridging machine learning and physics.
STAT-MECHApr 29, 2020
Extending machine learning classification capabilities with histogram reweightingDimitrios Bachtis, Gert Aarts, Biagio Lucini
We propose the use of Monte Carlo histogram reweighting to extrapolate predictions of machine learning methods. In our approach, we treat the output from a convolutional neural network as an observable in a statistical system, enabling its extrapolation over continuous ranges in parameter space. We demonstrate our proposal using the phase transition in the two-dimensional Ising model. By interpreting the output of the neural network as an order parameter, we explore connections with known observables in the system and investigate its scaling behaviour. A finite size scaling analysis is conducted based on quantities derived from the neural network that yields accurate estimates for the critical exponents and the critical temperature. The method improves the prospects of acquiring precision measurements from machine learning in physical systems without an order parameter and those where direct sampling in regions of parameter space might not be possible.