Formulations and scalability of neural network surrogates in nonlinear optimization problems
This work addresses scalability bottlenecks for neural network surrogates in optimization, particularly for power flow problems, but is incremental as it compares existing formulations with a new implementation.
The authors compared full-space, reduced-space, and gray-box formulations for embedding trained neural networks in nonlinear optimization problems, finding that the gray-box formulation was the most scalable, solving a test problem with a 590-million-parameter network in 2.5 times the time of a simpler baseline.
We compare full-space, reduced-space, and gray-box formulations for representing trained neural networks in nonlinear constrained optimization problems. We test these formulations on a transient stability-constrained, security-constrained alternating current optimal power flow (SCOPF) problem where the transient stability criteria are represented by a trained neural network surrogate. Optimization problems are implemented in JuMP and trained neural networks are embedded using a new Julia package: MathOptAI.jl. To study the bottlenecks of the three formulations, we use neural networks with up to 590 million trained parameters. The full-space formulation is bottlenecked by the linear solver used by the optimization algorithm, while the reduced-space formulation is bottlenecked by the algebraic modeling environment and derivative computations. The gray-box formulation is the most scalable and is capable of solving with the largest neural networks tested. It is bottlenecked by evaluation of the neural network's outputs and their derivatives, which may be accelerated with a graphics processing unit (GPU). Leveraging the gray-box formulation and GPU acceleration, we solve our test problem with our largest neural network surrogate in 2.5$\times$ the time required for a simpler SCOPF problem without the stability constraint.