Hierarchical autoregressive neural networks for statistical systems

arXiv:2203.10989v216 citationsh-index: 16
Originality Incremental advance
AI Analysis

This work addresses scaling limitations for researchers in computational physics and statistical mechanics, offering an incremental improvement over prior neural network methods.

The paper tackles the problem of scaling neural networks for approximating probability distributions in statistical systems, such as the Ising model, by proposing a hierarchical association of physical degrees of freedom to neurons, which reduces scaling from quadratic to linear with system size. It demonstrates improved training quality, with benchmarks on lattices up to 512x512 spins, leading to better variational free energy estimates and reduced autocorrelation times in Monte Carlo simulations.

It was recently proposed that neural networks could be used to approximate many-dimensional probability distributions that appear e.g. in lattice field theories or statistical mechanics. Subsequently they can be used as variational approximators to asses extensive properties of statistical systems, like free energy, and also as neural samplers used in Monte Carlo simulations. The practical application of this approach is unfortunately limited by its unfavorable scaling both of the numerical cost required for training, and the memory requirements with the system size. This is due to the fact that the original proposition involved a neural network of width which scaled with the total number of degrees of freedom, e.g. $L^2$ in case of a two dimensional $L\times L$ lattice. In this work we propose a hierarchical association of physical degrees of freedom, for instance spins, to neurons which replaces it with the scaling with the linear extent $L$ of the system. We demonstrate our approach on the two-dimensional Ising model by simulating lattices of various sizes up to $128 \times 128$ spins, with time benchmarks reaching lattices of size $512 \times 512$. We observe that our proposal improves the quality of neural network training, i.e. the approximated probability distribution is closer to the target that could be previously achieved. As a consequence, the variational free energy reaches a value closer to its theoretical expectation and, if applied in a Markov Chain Monte Carlo algorithm, the resulting autocorrelation time is smaller. Finally, the replacement of a single neural network by a hierarchy of smaller networks considerably reduces the memory requirements.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes