LG MLOct 18, 2019

MARTHE: Scheduling the Learning Rate Via Online Hypergradients

Michele Donini, Luca Franceschi, Massimiliano Pontil, Orchid Majumder, Paolo Frasconi

arXiv:1910.08525v410.713 citationsh-index: 62Has Code

Originality Incremental advance

AI Analysis

This work addresses the challenge of hyperparameter optimization for learning rates in machine learning, offering a novel method that combines and improves upon existing techniques to enhance training stability and generalization.

The paper tackles the problem of automatically fitting task-specific learning rate schedules for better generalization by introducing MARTHE, an online algorithm that uses cheap hypergradient approximations and past optimization information to simulate future behavior, resulting in more stable schedules and improved model generalization.

We study the problem of fitting task-specific learning rate schedules from the perspective of hyperparameter optimization, aiming at good generalization. We describe the structure of the gradient of a validation error w.r.t. the learning rate schedule -- the hypergradient. Based on this, we introduce MARTHE, a novel online algorithm guided by cheap approximations of the hypergradient that uses past information from the optimization trajectory to simulate future behaviour. It interpolates between two recent techniques, RTHO (Franceschi et al., 2017) and HD (Baydin et al. 2018), and is able to produce learning rate schedules that are more stable leading to models that generalize better.

View on arXiv PDF Code

Similar