ChebNet: Efficient and Stable Constructions of Deep Neural Networks with Rectified Power Units via Chebyshev Approximations
This work addresses a practical problem for researchers and practitioners in machine learning who need stable and efficient approximations of smooth functions, though it is incremental as it builds on prior RePU network constructions.
The paper tackles the stability issue in constructing deep neural networks with rectified power units (RePU) by proposing ChebNet, which uses Chebyshev polynomial approximations for more stable and efficient network construction, achieving spectral accuracy and better fine-tuning results than previous methods.
In a previous study [B. Li, S. Tang and H. Yu, Commun. Comput. Phy. 27(2):379-411, 2020], it is shown that deep neural networks built with rectified power units (RePU) as activation functions can give better approximation for sufficient smooth functions than those built with rectified linear units, by converting polynomial approximations using power series into deep neural networks with optimal complexity and no approximation error. However, in practice, power series approximations are not easy to obtain due to the associated stability issue. In this paper, we propose a new and more stable way to construct RePU deep neural networks based on Chebyshev polynomial approximations. By using a hierarchical structure of Chebyshev polynomial approximation in frequency domain, we obtain efficient and stable deep neural network construction, which we call ChebNet. The approximation of smooth functions by ChebNets is no worse than the approximation by deep RePU nets using power series. On the same time, ChebNets are much more stable. Numerical results show that the constructed ChebNets can be further fine-tuned to obtain much better results than those obtained by tuning deep RePU nets constructed by power series approach. As spectral accuracy is hard to obtain by direct training of deep neural networks, ChebNets provide a practical way to obtain spectral accuracy, it is expected to be useful in real applications that require efficient approximations of smooth functions.