The limitation of neural nets for approximation and optimization
This work addresses the limitations of neural networks for optimization problems, providing incremental insights for researchers in computational optimization and machine learning.
The study assessed neural networks as surrogate models for approximating and minimizing objective functions in optimization, finding that SiLU activation performed best for approximation, neural networks were competitive for zero- and first-order approximations but underperformed on second-order, and combining neural net activations with quadratic interpolation reduced parameters, but gradient approximations did not improve state-of-the-art derivative-free optimization performance.
We are interested in assessing the use of neural networks as surrogate models to approximate and minimize objective functions in optimization problems. While neural networks are widely used for machine learning tasks such as classification and regression, their application in solving optimization problems has been limited. Our study begins by determining the best activation function for approximating the objective functions of popular nonlinear optimization test problems, and the evidence provided shows that~SiLU has the best performance. We then analyze the accuracy of function value, gradient, and Hessian approximations for such objective functions obtained through interpolation/regression models and neural networks. When compared to interpolation/regression models, neural networks can deliver competitive zero- and first-order approximations (at a high training cost) but underperform on second-order approximation. However, it is shown that combining a neural net activation function with the natural basis for quadratic interpolation/regression can waive the necessity of including cross terms in the natural basis, leading to models with fewer parameters to determine. Lastly, we provide evidence that the performance of a state-of-the-art derivative-free optimization algorithm can hardly be improved when the gradient of an objective function is approximated using any of the surrogate models considered, including neural networks.