DIS-NN STAT-MECH LG COMP-PHJan 3, 2025

Dissecting a Small Artificial Neural Network

Xiguang Yang, Krish Arora, Michael Bachmann

arXiv:2501.08341v1h-index: 2Journal of Physics A: Mathematical and Theoretical

Originality Synthesis-oriented

AI Analysis

This provides theoretical insights into neural network training dynamics for researchers studying optimization and statistical physics analogies, though it's incremental as it focuses on a simple model.

The researchers analyzed the loss landscape and backpropagation dynamics of a minimal XOR neural network, finding that cross-sections in parameter space reveal why backpropagation converges efficiently despite weight drift, and introduced microcanonical entropy to characterize phase behavior analogous to thermodynamic systems.

We investigate the loss landscape and backpropagation dynamics of convergence for the simplest possible artificial neural network representing the logical exclusive-OR (XOR) gate. Cross-sections of the loss landscape in the nine-dimensional parameter space are found to exhibit distinct features, which help understand why backpropagation efficiently achieves convergence toward zero loss, whereas values of weights and biases keep drifting. Differences in shapes of cross-sections obtained by nonrandomized and randomized batches are discussed. In reference to statistical physics we introduce the microcanonical entropy as a unique quantity that allows to characterize the phase behavior of the network. Learning in neural networks can thus be thought of as an annealing process that experiences the analogue of phase transitions known from thermodynamic systems. It also reveals how the loss landscape simplifies as more hidden neurons are added to the network, eliminating entropic barriers caused by finite-size effects.

View on arXiv PDF

Similar