Parallel Trust-Region Approaches in Neural Network Training: Beyond Traditional Methods
This work addresses the computational inefficiency and tuning challenges in neural network training for researchers and practitioners, though it appears incremental as it builds on existing trust-region frameworks.
The authors tackled the problem of neural network training by proposing a parallelizable trust-region variant called APTS, which eliminates hyper-parameter tuning and ensures global convergence, achieving competitive performance compared to methods like SGD and Adam in numerical experiments.
We propose to train neural networks (NNs) using a novel variant of the ``Additively Preconditioned Trust-region Strategy'' (APTS). The proposed method is based on a parallelizable additive domain decomposition approach applied to the neural network's parameters. Built upon the TR framework, the APTS method ensures global convergence towards a minimizer. Moreover, it eliminates the need for computationally expensive hyper-parameter tuning, as the TR algorithm automatically determines the step size in each iteration. We demonstrate the capabilities, strengths, and limitations of the proposed APTS training method by performing a series of numerical experiments. The presented numerical study includes a comparison with widely used training methods such as SGD, Adam, LBFGS, and the standard TR method.