MLLGMar 2, 2023

Penalising the biases in norm regularisation enforces sparsity

arXiv:2303.01353v420 citationsh-index: 17
Originality Incremental advance
AI Analysis

This work provides foundational insights into regularization mechanisms for neural networks, addressing a theoretical gap in how bias terms affect sparsity, which is incremental but clarifies a specific bottleneck in optimization.

The paper tackles the theoretical understanding of norm regularization in neural networks, showing that penalizing bias terms enforces sparsity in the minimal norm interpolator for one-hidden-layer ReLU networks with unidimensional data, whereas omitting bias regularization leads to non-sparse solutions.

Controlling the parameters' norm often yields good generalisation when training neural networks. Beyond simple intuitions, the relation between regularising parameters' norm and obtained estimators remains theoretically misunderstood. For one hidden ReLU layer networks with unidimensional data, this work shows the parameters' norm required to represent a function is given by the total variation of its second derivative, weighted by a $\sqrt{1+x^2}$ factor. Notably, this weighting factor disappears when the norm of bias terms is not regularised. The presence of this additional weighting factor is of utmost significance as it is shown to enforce the uniqueness and sparsity (in the number of kinks) of the minimal norm interpolator. Conversely, omitting the bias' norm allows for non-sparse solutions. Penalising the bias terms in the regularisation, either explicitly or implicitly, thus leads to sparse estimators.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes