Approximation Rates for Neural Networks with General Activation Functions
This work addresses theoretical approximation limits for neural networks, which is foundational for machine learning, but it is incremental as it builds on prior results by extending them to broader activation function classes.
The paper tackles the problem of approximating functions with neural networks using general activation functions, extending dimension-independent approximation rates to polynomially-decaying non-sigmoidal functions and providing weaker rates for bounded, integrable functions, with a stratified sampling method improving rates under mild assumptions.
We prove some new results concerning the approximation rate of neural networks with general activation functions. Our first result concerns the rate of approximation of a two layer neural network with a polynomially-decaying non-sigmoidal activation function. We extend the dimension independent approximation rates previously obtained to this new class of activation functions. Our second result gives a weaker, but still dimension independent, approximation rate for a larger class of activation functions, removing the polynomial decay assumption. This result applies to any bounded, integrable activation function. Finally, we show that a stratified sampling approach can be used to improve the approximation rate for polynomially decaying activation functions under mild additional assumptions.