Domain-independent Dominance of Adaptive Methods
This work addresses the challenge of hyperparameter tuning in optimizers for machine learning practitioners, though it is incremental as it builds on existing adaptive methods like Adam.
The paper tackles the problem of optimizing adaptive methods for vision tasks by introducing AvaGrad, which decouples learning rate and adaptability to simplify hyperparameter tuning. The result shows that AvaGrad matches the best generalization accuracy of existing optimizers on tasks like CIFAR, ImageNet, and Penn Treebank.
From a simplified analysis of adaptive methods, we derive AvaGrad, a new optimizer which outperforms SGD on vision tasks when its adaptability is properly tuned. We observe that the power of our method is partially explained by a decoupling of learning rate and adaptability, greatly simplifying hyperparameter search. In light of this observation, we demonstrate that, against conventional wisdom, Adam can also outperform SGD on vision tasks, as long as the coupling between its learning rate and adaptability is taken into account. In practice, AvaGrad matches the best results, as measured by generalization accuracy, delivered by any existing optimizer (SGD or adaptive) across image classification (CIFAR, ImageNet) and character-level language modelling (Penn Treebank) tasks.