Training Deep Neural Networks via Branch-and-Bound
This work addresses the challenge of efficient and optimal training for deep neural networks, which is crucial for researchers and practitioners in computer vision, though it appears incremental as it builds on existing branch-and-bound and optimization techniques.
The paper tackles the problem of training deep neural networks by proposing BPGrad, a novel approximate algorithm based on branch-and-bound that adaptively determines step sizes using Lipschitz continuity assumptions, achieving optimal solutions within finite iterations and demonstrating favorable performance in object recognition, detection, and segmentation tasks compared to other stochastic optimization methods.
In this paper, we propose BPGrad, a novel approximate algorithm for deep nueral network training, based on adaptive estimates of feasible region via branch-and-bound. The method is based on the assumption of Lipschitz continuity in objective function, and as a result, it can adaptively determine the step size for the current gradient given the history of previous updates. We prove that, by repeating such a branch-and-pruning procedure, it can achieve the optimal solution within finite iterations. A computationally efficient solver based on BPGrad has been proposed to train the deep neural networks. Empirical results demonstrate that BPGrad solver works well in practice and compares favorably to other stochastic optimization methods in the tasks of object recognition, detection, and segmentation. The code is available at \url{https://github.com/RyanCV/BPGrad}.