Rational Neural Networks have Expressivity Advantages

arXiv:2602.12390v11.4h-index: 23

Originality Highly original

AI Analysis

This addresses the challenge of designing more efficient neural network architectures for machine learning practitioners, offering a novel activation approach that can enhance model performance without changing existing training pipelines.

The paper tackles the problem of neural network expressivity by introducing trainable low-degree rational activation functions, showing they are more expressive and parameter-efficient than standard fixed activations like ReLU and Tanh, with an exponential gap in approximation efficiency (poly(log log(1/ε)) vs. Ω(log(1/ε)) parameters).

We study neural networks with trainable low-degree rational activation functions and show that they are more expressive and parameter-efficient than modern piecewise-linear and smooth activations such as ELU, LeakyReLU, LogSigmoid, PReLU, ReLU, SELU, CELU, Sigmoid, SiLU, Mish, Softplus, Tanh, Softmin, Softmax, and LogSoftmax. For an error target of $\varepsilon>0$, we establish approximation-theoretic separations: Any network built from standard fixed activations can be uniformly approximated on compact domains by a rational-activation network with only $\mathrm{poly}(\log\log(1/\varepsilon))$ overhead in size, while the converse provably requires $Ω(\log(1/\varepsilon))$ parameters in the worst case. This exponential gap persists at the level of full networks and extends to gated activations and transformer-style nonlinearities. In practice, rational activations integrate seamlessly into standard architectures and training pipelines, allowing rationals to match or outperform fixed activations under identical architectures and optimizers.

View on arXiv PDF

Similar