Luke Causer

2papers

2 Papers

STAT-MECHJul 15, 2024
Discrete generative diffusion models without stochastic differential equations: a tensor network approach

Luke Causer, Grant M. Rotskoff, Juan P. Garrahan · stanford

Diffusion models (DMs) are a class of generative machine learning methods that sample a target distribution by transforming samples of a trivial (often Gaussian) distribution using a learned stochastic differential equation. In standard DMs, this is done by learning a ``score function'' that reverses the effect of adding diffusive noise to the distribution of interest. Here we consider the generalisation of DMs to lattice systems with discrete degrees of freedom, and where noise is added via Markov chain jump dynamics. We show how to use tensor networks (TNs) to efficiently define and sample such ``discrete diffusion models'' (DDMs) without explicitly having to solve a stochastic differential equation. We show the following: (i) by parametrising the data and evolution operators as TNs, the denoising dynamics can be represented exactly; (ii) the auto-regressive nature of TNs allows to generate samples efficiently and without bias; (iii) for sampling Boltzmann-like distributions, TNs allow to construct an efficient learning scheme that integrates well with Monte Carlo. We illustrate this approach to study the equilibrium of two models with non-trivial thermodynamics, the $d=1$ constrained Fredkin chain and the $d=2$ Ising model.

STAT-MECHJun 23, 2023
Minibatch training of neural network ensembles via trajectory sampling

Jamie F. Mair, Luke Causer, Juan P. Garrahan

Most iterative neural network training methods use estimates of the loss function over small random subsets (or minibatches) of the data to update the parameters, which aid in decoupling the training time from the (often very large) size of the training datasets. Here, we show that a minibatch approach can also be used to train neural network ensembles (NNEs) via trajectory methods in a highly efficient manner. We illustrate this approach by training NNEs to classify images in the MNIST datasets. This method gives an improvement to the training times, allowing it to scale as the ratio of the size of the dataset to that of the average minibatch size which, in the case of MNIST, gives a computational improvement typically of two orders of magnitude. We highlight the advantage of using longer trajectories to represent NNEs, both for improved accuracy in inference and reduced update cost in terms of the samples needed in minibatch updates.