AIDec 24, 2024

Understanding Artificial Neural Network's Behavior from Neuron Activation Perspective

arXiv:2412.18073v12.3h-index: 1

Originality Incremental advance

AI Analysis

It addresses the fundamental challenge of interpreting neural network scaling for researchers and practitioners, offering a theoretical foundation that bridges empirical observations with testable predictions, though it is incremental in building on existing scaling law research.

This paper tackles the problem of understanding deep neural network behavior by analyzing neuron activation dynamics, resulting in theoretical insights that explain neural scaling laws and generalization under over-parameterization, with derived mathematical relationships such as power-law distributions.

This paper explores the intricate behavior of deep neural networks (DNNs) through the lens of neuron activation dynamics. We propose a probabilistic framework that can analyze models' neuron activation patterns as a stochastic process, uncovering theoretical insights into neural scaling laws, such as over-parameterization and the power-law decay of loss with respect to dataset size. By deriving key mathematical relationships, we present that the number of activated neurons increases in the form of $N(1-(\frac{bN}{D+bN})^b)$, and the neuron activation should follows power-law distribution. Based on these two mathematical results, we demonstrate how DNNs maintain generalization capabilities even under over-parameterization, and we elucidate the phase transition phenomenon observed in loss curves as dataset size plotted in log-axis (i.e. the data magnitude increases linearly). Moreover, by combining the above two phenomenons and the power-law distribution of neuron activation, we derived the power-law decay of neural network's loss function as the data size scale increases. Furthermore, our analysis bridges the gap between empirical observations and theoretical underpinnings, offering experimentally testable predictions regarding parameter efficiency and model compressibility. These findings provide a foundation for understanding neural network scaling and present new directions for optimizing DNN performance.

View on arXiv PDF

Similar