ML LGSep 2, 2022

Optimal bump functions for shallow ReLU networks: Weight decay, depth separation and the curse of dimensionality

arXiv:2209.01173v13.81 citationsh-index: 14

Originality Incremental advance

AI Analysis

This work addresses the curse of dimensionality in neural network approximation for researchers in machine learning theory, but it is incremental as it builds on existing analysis of depth separation.

The paper tackles the problem of how shallow ReLU networks with weight decay regularization interpolate data from a radially symmetric distribution, showing that in the infinite neuron and data limit, a unique minimizer exists with weight decay and Lipschitz constant scaling as d and √d, respectively, and that weight decay grows exponentially if labels are imposed on a ball of radius ε. It also demonstrates that two-layer networks avoid this curse of dimensionality.

In this note, we study how neural networks with a single hidden layer and ReLU activation interpolate data drawn from a radially symmetric distribution with target labels 1 at the origin and 0 outside the unit ball, if no labels are known inside the unit ball. With weight decay regularization and in the infinite neuron, infinite data limit, we prove that a unique radially symmetric minimizer exists, whose weight decay regularizer and Lipschitz constant grow as $d$ and $\sqrt{d}$ respectively. We furthermore show that the weight decay regularizer grows exponentially in $d$ if the label $1$ is imposed on a ball of radius $\varepsilon$ rather than just at the origin. By comparison, a neural networks with two hidden layers can approximate the target function without encountering the curse of dimensionality.

View on arXiv PDF

Similar