Biagio Lucini

HEP-LAT
h-index5
11papers
123citations
Novelty48%
AI Score35

11 Papers

DIS-NNJul 23, 2024
Stochastic weight matrix dynamics during learning and Dyson Brownian motion

Gert Aarts, Biagio Lucini, Chanju Park

We demonstrate that the update of weight matrices in learning algorithms can be described in the framework of Dyson Brownian motion, thereby inheriting many features of random matrix theory. We relate the level of stochasticity to the ratio of the learning rate and the mini-batch size, providing more robust evidence to a previously conjectured scaling relationship. We discuss universal and non-universal features in the resulting Coulomb gas distribution and identify the Wigner surmise and Wigner semicircle explicitly in a teacher-student model and in the (near-)solvable case of the Gaussian restricted Boltzmann machine.

DIS-NNNov 20, 2024
Dyson Brownian motion and random matrix dynamics of weight matrices during learning

Gert Aarts, Ouraman Hajizadeh, Biagio Lucini et al.

During training, weight matrices in machine learning architectures are updated using stochastic gradient descent or variations thereof. In this contribution we employ concepts of random matrix theory to analyse the resulting stochastic matrix dynamics. We first demonstrate that the dynamics can generically be described using Dyson Brownian motion, leading to e.g. eigenvalue repulsion. The level of stochasticity is shown to depend on the ratio of the learning rate and the mini-batch size, explaining the empirically observed linear scaling rule. We verify this linear scaling in the restricted Boltzmann machine. Subsequently we study weight matrix dynamics in transformers (a nano-GPT), following the evolution from a Marchenko-Pastur distribution for eigenvalues at initialisation to a combination with additional structure at the end of learning.

DIS-NNSep 1, 2025
Phase diagram and eigenvalue dynamics of stochastic gradient descent in multilayer neural networks

Chanju Park, Biagio Lucini, Gert Aarts

Hyperparameter tuning is one of the essential steps to guarantee the convergence of machine learning models. We argue that intuition about the optimal choice of hyperparameters for stochastic gradient descent can be obtained by studying a neural network's phase diagram, in which each phase is characterised by distinctive dynamics of the singular values of weight matrices. Taking inspiration from disordered systems, we start from the observation that the loss landscape of a multilayer neural network with mean squared error can be interpreted as a disordered system in feature space, where the learnt features are mapped to soft spin degrees of freedom, the initial variance of the weight matrices is interpreted as the strength of the disorder, and temperature is given by the ratio of the learning rate and the batch size. As the model is trained, three phases can be identified, in which the dynamics of weight matrices is qualitatively different. Employing a Langevin equation for stochastic gradient descent, previously derived using Dyson Brownian motion, we demonstrate that the three dynamical regimes can be classified effectively, providing practical guidance for the choice of hyperparameters of the optimiser.

HEP-LATDec 29, 2024
Random Matrix Theory for Stochastic Gradient Descent

Chanju Park, Matteo Favoni, Biagio Lucini et al.

Investigating the dynamics of learning in machine learning algorithms is of paramount importance for understanding how and why an approach may be successful. The tools of physics and statistics provide a robust setting for such investigations. Here we apply concepts from random matrix theory to describe stochastic weight matrix dynamics, using the framework of Dyson Brownian motion. We derive the linear scaling rule between the learning rate (step size) and the batch size, and identify universal and non-universal aspects of weight matrix dynamics. We test our findings in the (near-)solvable case of the Gaussian Restricted Boltzmann Machine and in a linear one-hidden-layer neural network.

HEP-LATFeb 10, 2022
Applications of Machine Learning to Lattice Quantum Field Theory

Denis Boyda, Salvatore Calì, Sam Foreman et al.

There is great potential to apply machine learning in the area of numerical lattice quantum field theory, but full exploitation of that potential will require new strategies. In this white paper for the Snowmass community planning process, we discuss the unique requirements of machine learning for lattice quantum field theory research and outline what is needed to enable exploration and deployment of this approach in the future.

LGOct 21, 2021
Quantum field theories, Markov random fields and machine learning

Dimitrios Bachtis, Gert Aarts, Biagio Lucini

The transition to Euclidean space and the discretization of quantum field theories on spatial or space-time lattices opens up the opportunity to investigate probabilistic machine learning within quantum field theory. Here, we will discuss how discretized Euclidean field theories, such as the $φ^{4}$ lattice field theory on a square lattice, are mathematically equivalent to Markov fields, a notable class of probabilistic graphical models with applications in a variety of research areas, including machine learning. The results are established based on the Hammersley-Clifford theorem. We will then derive neural networks from quantum field theories and discuss applications pertinent to the minimization of the Kullback-Leibler divergence for the probability distribution of the $φ^{4}$ machine learning algorithms and other probability distributions.

STAT-MECHSep 22, 2021
Quantitative analysis of phase transitions in two-dimensional XY models using persistent homology

Nicholas Sale, Jeffrey Giansiracusa, Biagio Lucini

We use persistent homology and persistence images as an observable of three different variants of the two-dimensional XY model in order to identify and study their phase transitions. We examine models with the classical XY action, a topological lattice action, and an action with an additional nematic term. In particular, we introduce a new way of computing the persistent homology of lattice spin model configurations and, by considering the fluctuations in the output of logistic regression and k-nearest neighbours models trained on persistence images, we develop a methodology to extract estimates of the critical temperature and the critical exponent of the correlation length. We put particular emphasis on finite-size scaling behaviour and producing estimates with quantifiable error. For each model we successfully identify its phase transition(s) and are able to get an accurate determination of the critical temperatures and critical exponents of the correlation length.

LGSep 16, 2021
Machine learning with quantum field theories

Dimitrios Bachtis, Gert Aarts, Biagio Lucini

The precise equivalence between discretized Euclidean field theories and a certain class of probabilistic graphical models, namely the mathematical framework of Markov random fields, opens up the opportunity to investigate machine learning from the perspective of quantum field theory. In this contribution we will demonstrate, through the Hammersley-Clifford theorem, that the $φ^{4}$ scalar field theory on a square lattice satisfies the local Markov property and can therefore be recast as a Markov random field. We will then derive from the $φ^{4}$ theory machine learning algorithms and neural networks which can be viewed as generalizations of conventional neural network architectures. Finally, we will conclude by presenting applications based on the minimization of an asymmetric distance between the probability distribution of the $φ^{4}$ machine learning algorithms and target probability distributions.

HEP-LATFeb 18, 2021
Quantum field-theoretic machine learning

Dimitrios Bachtis, Gert Aarts, Biagio Lucini

We derive machine learning algorithms from discretized Euclidean field theories, making inference and learning possible within dynamics described by quantum field theory. Specifically, we demonstrate that the $φ^{4}$ scalar field theory satisfies the Hammersley-Clifford theorem, therefore recasting it as a machine learning algorithm within the mathematically rigorous framework of Markov random fields. We illustrate the concepts by minimizing an asymmetric distance between the probability distribution of the $φ^{4}$ theory and that of target distributions, by quantifying the overlap of statistical ensembles between probability distributions and through reweighting to complex-valued actions with longer-range interactions. Neural network architectures are additionally derived from the $φ^{4}$ theory which can be viewed as generalizations of conventional neural networks and applications are presented. We conclude by discussing how the proposal opens up a new research avenue, that of developing a mathematical and computational framework of machine learning within quantum field theory.

HEP-LATSep 30, 2020
Adding machine learning within Hamiltonians: Renormalization group transformations, symmetry breaking and restoration

Dimitrios Bachtis, Gert Aarts, Biagio Lucini

We present a physical interpretation of machine learning functions, opening up the possibility to control properties of statistical systems via the inclusion of these functions in Hamiltonians. In particular, we include the predictive function of a neural network, designed for phase classification, as a conjugate variable coupled to an external field within the Hamiltonian of a system. Results in the two-dimensional Ising model evidence that the field can induce an order-disorder phase transition by breaking or restoring the symmetry, in contrast with the field of the conventional order parameter which causes explicit symmetry breaking. The critical behavior is then studied by proposing a Hamiltonian-agnostic reweighting approach and forming a renormalization group mapping on quantities derived from the neural network. Accurate estimates of the critical point and of the critical exponents related to the operators that govern the divergence of the correlation length are provided. We conclude by discussing how the method provides an essential step toward bridging machine learning and physics.

STAT-MECHApr 29, 2020
Extending machine learning classification capabilities with histogram reweighting

Dimitrios Bachtis, Gert Aarts, Biagio Lucini

We propose the use of Monte Carlo histogram reweighting to extrapolate predictions of machine learning methods. In our approach, we treat the output from a convolutional neural network as an observable in a statistical system, enabling its extrapolation over continuous ranges in parameter space. We demonstrate our proposal using the phase transition in the two-dimensional Ising model. By interpreting the output of the neural network as an order parameter, we explore connections with known observables in the system and investigate its scaling behaviour. A finite size scaling analysis is conducted based on quantities derived from the neural network that yields accurate estimates for the critical exponents and the critical temperature. The method improves the prospects of acquiring precision measurements from machine learning in physical systems without an order parameter and those where direct sampling in regions of parameter space might not be possible.