LGPRMLDec 6, 2024

Stably unactivated neurons in ReLU neural networks

arXiv:2412.06829v2
Originality Incremental advance
AI Analysis

This work addresses a theoretical issue in neural network expressiveness for researchers in machine learning, but it is incremental as it builds on existing studies of architecture effects.

The paper tackles the problem of stably unactivated neurons in ReLU neural networks, which reduce expressiveness, by deriving exact probabilities for such neurons in the second hidden layer under specific initialization conditions, with results like exactly (2^{n_0}+1)/(4^{n_0+1}) for networks with n_0+1 neurons in the first layer.

The choice of architecture of a neural network influences which functions will be realizable by that neural network and, as a result, studying the expressiveness of a chosen architecture has received much attention. In ReLU neural networks, the presence of stably unactivated neurons can reduce the network's expressiveness. In this work, we investigate the probability of a neuron in the second hidden layer of such neural networks being stably unactivated when the weights and biases are initialized from symmetric probability distributions. For networks with input dimension $n_0$, we prove that if the first hidden layer has $n_0+1$ neurons then this probability is exactly $\frac{2^{n_0}+1}{4^{n_0+1}}$, and if the first hidden layer has $n_1$ neurons, $n_1 \le n_0$, then the probability is $\frac{1}{2^{n_1+1}}$. Finally, for the case when the first hidden layer has more neurons than $n_0+1$, a conjecture is proposed along with the rationale. Computational evidence is presented to support the conjecture.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes