Rational Neural Networks have Expressivity Advantages
This addresses the challenge of designing more efficient neural network architectures for machine learning practitioners, offering a novel activation approach that can enhance model performance without changing existing training pipelines.
The paper tackles the problem of neural network expressivity by introducing trainable low-degree rational activation functions, showing they are more expressive and parameter-efficient than standard fixed activations like ReLU and Tanh, with an exponential gap in approximation efficiency (poly(log log(1/ε)) vs. Ω(log(1/ε)) parameters).
We study neural networks with trainable low-degree rational activation functions and show that they are more expressive and parameter-efficient than modern piecewise-linear and smooth activations such as ELU, LeakyReLU, LogSigmoid, PReLU, ReLU, SELU, CELU, Sigmoid, SiLU, Mish, Softplus, Tanh, Softmin, Softmax, and LogSoftmax. For an error target of $\varepsilon>0$, we establish approximation-theoretic separations: Any network built from standard fixed activations can be uniformly approximated on compact domains by a rational-activation network with only $\mathrm{poly}(\log\log(1/\varepsilon))$ overhead in size, while the converse provably requires $Ω(\log(1/\varepsilon))$ parameters in the worst case. This exponential gap persists at the level of full networks and extends to gated activations and transformer-style nonlinearities. In practice, rational activations integrate seamlessly into standard architectures and training pipelines, allowing rationals to match or outperform fixed activations under identical architectures and optimizers.