LGMLFeb 11, 2024

Depth Separations in Neural Networks: Separating the Dimension from the Accuracy

arXiv:2402.07248v22 citationsh-index: 22COLT
AI Analysis

This addresses a foundational theoretical problem in understanding neural network depth efficiency, with implications for all of ML/AI.

The paper proves an exponential size separation between depth 2 and depth 3 neural networks for approximating Lipschitz functions to constant accuracy under constant parameters, resolving an open problem and showing the curse of dimensionality in depth 2 networks.

We prove an exponential size separation between depth 2 and depth 3 neural networks (with real inputs), when approximating a $\mathcal{O}(1)$-Lipschitz target function to constant accuracy, with respect to a distribution with support in the unit ball, under the mild assumption that the weights of the depth 2 network are exponentially bounded. This resolves an open problem posed in \citet{safran2019depth}, and proves that the curse of dimensionality manifests itself in depth 2 approximation, even in cases where the target function can be represented efficiently using a depth 3 network. Previously, lower bounds that were used to separate depth 2 from depth 3 networks required that at least one of the Lipschitz constant, target accuracy or (some measure of) the size of the domain of approximation scale \emph{polynomially} with the input dimension, whereas in our result these parameters are fixed to be \emph{constants} independent of the input dimension: our parameters are simultaneously optimal. Our lower bound holds for a wide variety of activation functions, and is based on a novel application of a worst- to average-case random self-reducibility argument, allowing us to leverage depth 2 threshold circuits lower bounds in a new domain.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes