LGOCMLNov 17, 2022

VeLO: Training Versatile Learned Optimizers by Scaling Up

AnthropicDeepMind
arXiv:2211.09760v181 citationsh-index: 63Has Code
Originality Highly original
AI Analysis

This work addresses the need for more efficient and adaptive optimization methods in deep learning, offering a novel approach that could reduce manual tuning efforts.

The authors tackled the problem of replacing hand-designed optimizers in deep learning by training a versatile learned optimizer using large-scale compute, which automatically adapts to optimization tasks without hyperparameter tuning and shows compelling performance.

While deep learning models have replaced hand-designed features across many domains, these models are still trained with hand-designed optimizers. In this work, we leverage the same scaling approach behind the success of deep learning to learn versatile optimizers. We train an optimizer for deep learning which is itself a small neural network that ingests gradients and outputs parameter updates. Meta-trained with approximately four thousand TPU-months of compute on a wide variety of optimization tasks, our optimizer not only exhibits compelling performance, but optimizes in interesting and unexpected ways. It requires no hyperparameter tuning, instead automatically adapting to the specifics of the problem being optimized. We open source our learned optimizer, meta-training code, the associated train and test data, and an extensive optimizer benchmark suite with baselines at velo-code.github.io.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes