LG STAT-MECH NEMay 16, 2022

Training neural networks using Metropolis Monte Carlo and an adaptive variant

Stephen Whitelam, Viktor Selin, Ian Benlolo, Corneel Casert, Isaac Tamblyn

arXiv:2205.07408v23.312 citationsh-index: 32Has Code

Originality Incremental advance

AI Analysis

This provides a complementary training method for neural networks, enabling architectures inaccessible to gradient descent, though it is incremental as it builds on existing Monte Carlo techniques.

The authors tackled neural network training by applying the zero-temperature Metropolis Monte Carlo algorithm, finding it achieves comparable accuracy to gradient descent but can fail with heterogeneous networks, leading them to develop an adaptive variant (aMC) that trains deep and recurrent networks where gradients are problematic.

We examine the zero-temperature Metropolis Monte Carlo algorithm as a tool for training a neural network by minimizing a loss function. We find that, as expected on theoretical grounds and shown empirically by other authors, Metropolis Monte Carlo can train a neural net with an accuracy comparable to that of gradient descent, if not necessarily as quickly. The Metropolis algorithm does not fail automatically when the number of parameters of a neural network is large. It can fail when a neural network's structure or neuron activations are strongly heterogenous, and we introduce an adaptive Monte Carlo algorithm, aMC, to overcome these limitations. The intrinsic stochasticity and numerical stability of the Monte Carlo method allow aMC to train deep neural networks and recurrent neural networks in which the gradient is too small or too large to allow training by gradient descent. Monte Carlo methods offer a complement to gradient-based methods for training neural networks, allowing access to a distinct set of network architectures and principles.

View on arXiv PDF Code

Similar