LGITFeb 5

How Controlling the Variance can Improve Training Stability of Sparsely Activated DNNs and CNNs

arXiv:2602.05779v1h-index: 2
Originality Incremental advance
AI Analysis

This addresses the challenge of reducing energy consumption in machine learning models with fully connected layers, though it appears incremental as it builds on existing Edge-of-Chaos initialization strategies.

The paper tackled the problem of training instability in deep networks with sparsely activated layers by showing that initializations leading to larger fixed Gaussian process variances improve expressivity and stability, achieving up to 90% activation sparsity in DNNs and CNNs while maintaining full or near-full accuracy.

The intermediate layers of deep networks can be characterised as a Gaussian process, in particular the Edge-of-Chaos (EoC) initialisation strategy prescribes the limiting covariance matrix of the Gaussian process. Here we show that the under-utilised chosen variance of the Gaussian process is important in the training of deep networks with sparsity inducing activation, such as a shifted and clipped ReLU, $\text{CReLU}_{τ,m}(x)=\min(\max(x-τ,0),m)$. Specifically, initialisations leading to larger fixed Gaussian process variances, allow for improved expressivity with activation sparsity as large as 90% in DNNs and CNNs, and generally improve the stability of the training process. Enabling full, or near full, accuracy at such high levels of sparsity in the hidden layers suggests a promising mechanism to reduce the energy consumption of machine learning models involving fully connected layers.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes