LG MLNov 15, 2021

Deep Network Approximation in Terms of Intrinsic Parameters

arXiv:2111.07964v29.913 citations

Originality Incremental advance

AI Analysis

This addresses the computational cost problem in deep learning by showing that successful learning is possible with a small number of parameters, which is incremental as it builds on existing approximation theory.

The paper demonstrates that deep neural networks can achieve high approximation accuracy with far fewer learnable parameters than typically assumed, proving theoretically that ReLU networks with n+2 intrinsic parameters can approximate Lipschitz continuous functions with an error of 5λ√d 2^{-n} and validating this through experiments on classification tasks.

One of the arguments to explain the success of deep learning is the powerful approximation capacity of deep neural networks. Such capacity is generally accompanied by the explosive growth of the number of parameters, which, in turn, leads to high computational costs. It is of great interest to ask whether we can achieve successful deep learning with a small number of learnable parameters adapting to the target function. From an approximation perspective, this paper shows that the number of parameters that need to be learned can be significantly smaller than people typically expect. First, we theoretically design ReLU networks with a few learnable parameters to achieve an attractive approximation. We prove by construction that, for any Lipschitz continuous function $f$ on $[0,1]^d$ with a Lipschitz constant $λ>0$, a ReLU network with $n+2$ intrinsic parameters (those depending on $f$) can approximate $f$ with an exponentially small error $5λ\sqrt{d}\,2^{-n}$. Such a result is generalized to generic continuous functions. Furthermore, we show that the idea of learning a small number of parameters to achieve a good approximation can be numerically observed. We conduct several experiments to verify that training a small part of parameters can also achieve good results for classification problems if other parameters are pre-specified or pre-trained from a related problem.

View on arXiv PDF

Similar