LGSTAT-MECHNEMay 16, 2022

Training neural networks using Metropolis Monte Carlo and an adaptive variant

arXiv:2205.07408v212 citationsh-index: 32
Originality Incremental advance
AI Analysis

This provides a complementary training method for neural networks, enabling architectures inaccessible to gradient descent, though it is incremental as it builds on existing Monte Carlo techniques.

The authors tackled neural network training by applying the zero-temperature Metropolis Monte Carlo algorithm, finding it achieves comparable accuracy to gradient descent but can fail with heterogeneous networks, leading them to develop an adaptive variant (aMC) that trains deep and recurrent networks where gradients are problematic.

We examine the zero-temperature Metropolis Monte Carlo algorithm as a tool for training a neural network by minimizing a loss function. We find that, as expected on theoretical grounds and shown empirically by other authors, Metropolis Monte Carlo can train a neural net with an accuracy comparable to that of gradient descent, if not necessarily as quickly. The Metropolis algorithm does not fail automatically when the number of parameters of a neural network is large. It can fail when a neural network's structure or neuron activations are strongly heterogenous, and we introduce an adaptive Monte Carlo algorithm, aMC, to overcome these limitations. The intrinsic stochasticity and numerical stability of the Monte Carlo method allow aMC to train deep neural networks and recurrent neural networks in which the gradient is too small or too large to allow training by gradient descent. Monte Carlo methods offer a complement to gradient-based methods for training neural networks, allowing access to a distinct set of network architectures and principles.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes