Ranger21: a synergistic deep learning optimizer
This work addresses the need for more effective and composable optimizers in deep learning, though it is incremental as it builds on existing components.
The authors tackled the problem of incremental optimizer improvements being underutilized by introducing Ranger21, which combines AdamW with eight components from the literature, resulting in significantly improved validation accuracy, training speed, smoother curves, and the ability to train a ResNet50 on ImageNet2012 without Batch Normalization where AdamW fails.
As optimizers are critical to the performances of neural networks, every year a large number of papers innovating on the subject are published. However, while most of these publications provide incremental improvements to existing algorithms, they tend to be presented as new optimizers rather than composable algorithms. Thus, many worthwhile improvements are rarely seen out of their initial publication. Taking advantage of this untapped potential, we introduce Ranger21, a new optimizer which combines AdamW with eight components, carefully selected after reviewing and testing ideas from the literature. We found that the resulting optimizer provides significantly improved validation accuracy and training speed, smoother training curves, and is even able to train a ResNet50 on ImageNet2012 without Batch Normalization layers. A problem on which AdamW stays systematically stuck in a bad initial state.