LGMLDec 4, 2019

Domain-independent Dominance of Adaptive Methods

arXiv:1912.01823v323 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of hyperparameter tuning in optimizers for machine learning practitioners, though it is incremental as it builds on existing adaptive methods like Adam.

The paper tackles the problem of optimizing adaptive methods for vision tasks by introducing AvaGrad, which decouples learning rate and adaptability to simplify hyperparameter tuning. The result shows that AvaGrad matches the best generalization accuracy of existing optimizers on tasks like CIFAR, ImageNet, and Penn Treebank.

From a simplified analysis of adaptive methods, we derive AvaGrad, a new optimizer which outperforms SGD on vision tasks when its adaptability is properly tuned. We observe that the power of our method is partially explained by a decoupling of learning rate and adaptability, greatly simplifying hyperparameter search. In light of this observation, we demonstrate that, against conventional wisdom, Adam can also outperform SGD on vision tasks, as long as the coupling between its learning rate and adaptability is taken into account. In practice, AvaGrad matches the best results, as measured by generalization accuracy, delivered by any existing optimizer (SGD or adaptive) across image classification (CIFAR, ImageNet) and character-level language modelling (Penn Treebank) tasks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes