Biagio Lucini

DIS-NN

h-index38

10papers

9,047citations

Novelty43%

AI Score33

Ranked #116,097 of 194,257 authors (top 60%)#53 in DIS-NN (top 50%)

10 Papers

5.1DIS-NNJul 23, 2024

Stochastic weight matrix dynamics during learning and Dyson Brownian motion

Gert Aarts, Biagio Lucini, Chanju Park

We demonstrate that the update of weight matrices in learning algorithms can be described in the framework of Dyson Brownian motion, thereby inheriting many features of random matrix theory. We relate the level of stochasticity to the ratio of the learning rate and the mini-batch size, providing more robust evidence to a previously conjectured scaling relationship. We discuss universal and non-universal features in the resulting Coulomb gas distribution and identify the Wigner surmise and Wigner semicircle explicitly in a teacher-student model and in the (near-)solvable case of the Gaussian restricted Boltzmann machine.

DIS-NNJun 26

Spectral phase transitions and trainability in neural network learning dynamics

Chanju Park, Dario Bocchi, Francesco D'Amico et al.

The emergence of low-dimensional structures in the spectra of neural network weight matrices is a common empirical feature of trained models, but the dynamical origin of this phenomenon during learning remains an open problem. We formulate neural network training as the stochastic evolution of an initially random matrix ensemble, driven by stochastic gradient descent (SGD) updates that reshape the spectral bulk while amplifying signal strength. This induces a Baik-Ben Arous-Péché (BBP) transition during training, where isolated eigenvalues detach from the random bulk distribution, providing a dynamical framework for representation formation in high-dimensional learning dynamics. We demonstrate this in a solvable linear teacher-student model, where spectral evolution is analytically tractable and a phase diagram of trainability governed by the step size (or learning rate) and initial weight variance is obtained, and subsequently extend our formalism beyond the linear regime to nonlinear and stochastic settings. Numerical simulations in realistic settings support this picture, showing robust emergence of spectral alignment during training. Our results suggest that spectral analysis may provide a unified perspective of stochastic learning dynamics, linking trainability, optimisation hyperparameters, spectral phase transitions, and representation learning in neural networks.

3.3DIS-NNNov 20, 2024

Dyson Brownian motion and random matrix dynamics of weight matrices during learning

Gert Aarts, Ouraman Hajizadeh, Biagio Lucini et al.

During training, weight matrices in machine learning architectures are updated using stochastic gradient descent or variations thereof. In this contribution we employ concepts of random matrix theory to analyse the resulting stochastic matrix dynamics. We first demonstrate that the dynamics can generically be described using Dyson Brownian motion, leading to e.g. eigenvalue repulsion. The level of stochasticity is shown to depend on the ratio of the learning rate and the mini-batch size, explaining the empirically observed linear scaling rule. We verify this linear scaling in the restricted Boltzmann machine. Subsequently we study weight matrix dynamics in transformers (a nano-GPT), following the evolution from a Marchenko-Pastur distribution for eigenvalues at initialisation to a combination with additional structure at the end of learning.

1.2DIS-NNSep 1, 2025

Phase diagram and eigenvalue dynamics of stochastic gradient descent in multilayer neural networks

Chanju Park, Biagio Lucini, Gert Aarts

Hyperparameter tuning is one of the essential steps to guarantee the convergence of machine learning models. We argue that intuition about the optimal choice of hyperparameters for stochastic gradient descent can be obtained by studying a neural network's phase diagram, in which each phase is characterised by distinctive dynamics of the singular values of weight matrices. Taking inspiration from disordered systems, we start from the observation that the loss landscape of a multilayer neural network with mean squared error can be interpreted as a disordered system in feature space, where the learnt features are mapped to soft spin degrees of freedom, the initial variance of the weight matrices is interpreted as the strength of the disorder, and temperature is given by the ratio of the learning rate and the batch size. As the model is trained, three phases can be identified, in which the dynamics of weight matrices is qualitatively different. Employing a Langevin equation for stochastic gradient descent, previously derived using Dyson Brownian motion, we demonstrate that the three dynamical regimes can be classified effectively, providing practical guidance for the choice of hyperparameters of the optimiser.

1.2HEP-LATDec 29, 2024

Random Matrix Theory for Stochastic Gradient Descent

Chanju Park, Matteo Favoni, Biagio Lucini et al.

Investigating the dynamics of learning in machine learning algorithms is of paramount importance for understanding how and why an approach may be successful. The tools of physics and statistics provide a robust setting for such investigations. Here we apply concepts from random matrix theory to describe stochastic weight matrix dynamics, using the framework of Dyson Brownian motion. We derive the linear scaling rule between the learning rate (step size) and the batch size, and identify universal and non-universal aspects of weight matrix dynamics. We test our findings in the (near-)solvable case of the Gaussian Restricted Boltzmann Machine and in a linear one-hidden-layer neural network.

6.6HEP-LATFeb 10, 2022

Applications of Machine Learning to Lattice Quantum Field Theory

Denis Boyda, Salvatore Calì, Sam Foreman et al.

There is great potential to apply machine learning in the area of numerical lattice quantum field theory, but full exploitation of that potential will require new strategies. In this white paper for the Snowmass community planning process, we discuss the unique requirements of machine learning for lattice quantum field theory research and outline what is needed to enable exploration and deployment of this approach in the future.

3.1LGOct 21, 2021

Quantum field theories, Markov random fields and machine learning

Dimitrios Bachtis, Gert Aarts, Biagio Lucini

The transition to Euclidean space and the discretization of quantum field theories on spatial or space-time lattices opens up the opportunity to investigate probabilistic machine learning within quantum field theory. Here, we will discuss how discretized Euclidean field theories, such as the $φ^{4}$ lattice field theory on a square lattice, are mathematically equivalent to Markov fields, a notable class of probabilistic graphical models with applications in a variety of research areas, including machine learning. The results are established based on the Hammersley-Clifford theorem. We will then derive neural networks from quantum field theories and discuss applications pertinent to the minimization of the Kullback-Leibler divergence for the probability distribution of the $φ^{4}$ machine learning algorithms and other probability distributions.

3.1LGSep 16, 2021

Machine learning with quantum field theories

Dimitrios Bachtis, Gert Aarts, Biagio Lucini

The precise equivalence between discretized Euclidean field theories and a certain class of probabilistic graphical models, namely the mathematical framework of Markov random fields, opens up the opportunity to investigate machine learning from the perspective of quantum field theory. In this contribution we will demonstrate, through the Hammersley-Clifford theorem, that the $φ^{4}$ scalar field theory on a square lattice satisfies the local Markov property and can therefore be recast as a Markov random field. We will then derive from the $φ^{4}$ theory machine learning algorithms and neural networks which can be viewed as generalizations of conventional neural network architectures. Finally, we will conclude by presenting applications based on the minimization of an asymmetric distance between the probability distribution of the $φ^{4}$ machine learning algorithms and target probability distributions.

11.3HEP-LATFeb 18, 2021

Quantum field-theoretic machine learning

Dimitrios Bachtis, Gert Aarts, Biagio Lucini

We derive machine learning algorithms from discretized Euclidean field theories, making inference and learning possible within dynamics described by quantum field theory. Specifically, we demonstrate that the $φ^{4}$ scalar field theory satisfies the Hammersley-Clifford theorem, therefore recasting it as a machine learning algorithm within the mathematically rigorous framework of Markov random fields. We illustrate the concepts by minimizing an asymmetric distance between the probability distribution of the $φ^{4}$ theory and that of target distributions, by quantifying the overlap of statistical ensembles between probability distributions and through reweighting to complex-valued actions with longer-range interactions. Neural network architectures are additionally derived from the $φ^{4}$ theory which can be viewed as generalizations of conventional neural networks and applications are presented. We conclude by discussing how the proposal opens up a new research avenue, that of developing a mathematical and computational framework of machine learning within quantum field theory.

8.6STAT-MECHApr 29, 2020

Extending machine learning classification capabilities with histogram reweighting

Dimitrios Bachtis, Gert Aarts, Biagio Lucini

We propose the use of Monte Carlo histogram reweighting to extrapolate predictions of machine learning methods. In our approach, we treat the output from a convolutional neural network as an observable in a statistical system, enabling its extrapolation over continuous ranges in parameter space. We demonstrate our proposal using the phase transition in the two-dimensional Ising model. By interpreting the output of the neural network as an order parameter, we explore connections with known observables in the system and investigate its scaling behaviour. A finite size scaling analysis is conducted based on quantities derived from the neural network that yields accurate estimates for the critical exponents and the critical temperature. The method improves the prospects of acquiring precision measurements from machine learning in physical systems without an order parameter and those where direct sampling in regions of parameter space might not be possible.