Approximation with SiLU Networks: Constant Depth and Exponential Rates for Basic Operations
This provides theoretical insights into neural network approximation theory, specifically the trade-off between depth and activation parameter optimization, which is incremental but offers concrete efficiency bounds.
The paper tackles the problem of approximating functions like the square function and Sobolev spaces using SiLU networks, achieving approximation error ε with a two-layer network of constant width and weights scaling as β^±k where k = O(ln(1/ε)), and extending to networks with constant depth and O(ε^{-d/n}) parameters under optimal hyperparameters.
We present SiLU network constructions whose approximation efficiency depends critically on proper hyperparameter tuning. For the square function $x^2$, with optimally chosen shift $a$ and scale $β$, we achieve approximation error $\varepsilon$ using a two-layer network of constant width, where weights scale as $β^{\pm k}$ with $k = \mathcal{O}(\ln(1/\varepsilon))$. We then extend this approach through functional composition to Sobolev spaces, we obtain networks with depth $\mathcal{O}(1)$ and $\mathcal{O}(\varepsilon^{-d/n})$ parameters under optimal hyperparameters settings. Our work highlights the trade-off between architectural depth and activation parameter optimization in neural network approximation theory.