LG NA MLJun 22, 2020

Deep Network with Approximation Error Being Reciprocal of Width to Power of Square Root of Depth

arXiv:2006.12231v618.919 citations

Originality Highly original

AI Analysis

This provides a theoretical foundation for efficient function approximation in high-dimensional spaces, addressing a core challenge in machine learning, though it is incremental in the context of neural network approximation theory.

The paper introduces Floor-ReLU networks, a new type of neural network using Floor or ReLU activations, which achieve an approximation error of 3λd^{α/2}N^{-α√L} for Hölder functions on [0,1]^d, overcoming the curse of dimensionality for functions with moderate variation.

A new network with super approximation power is introduced. This network is built with Floor ($\lfloor x\rfloor$) or ReLU ($\max\{0,x\}$) activation function in each neuron and hence we call such networks Floor-ReLU networks. For any hyper-parameters $N\in\mathbb{N}^+$ and $L\in\mathbb{N}^+$, it is shown that Floor-ReLU networks with width $\max\{d,\, 5N+13\}$ and depth $64dL+3$ can uniformly approximate a Hölder function $f$ on $[0,1]^d$ with an approximation error $3λd^{α/2}N^{-α\sqrt{L}}$, where $α\in(0,1]$ and $λ$ are the Hölder order and constant, respectively. More generally for an arbitrary continuous function $f$ on $[0,1]^d$ with a modulus of continuity $ω_f(\cdot)$, the constructive approximation rate is $ω_f(\sqrt{d}\,N^{-\sqrt{L}})+2ω_f(\sqrt{d}){N^{-\sqrt{L}}}$. As a consequence, this new class of networks overcomes the curse of dimensionality in approximation power when the variation of $ω_f(r)$ as $r\to 0$ is moderate (e.g., $ω_f(r) \lesssim r^α$ for Hölder continuous functions), since the major term to be considered in our approximation rate is essentially $\sqrt{d}$ times a function of $N$ and $L$ independent of $d$ within the modulus of continuity.

View on arXiv PDF

Similar