LGMLOct 18, 2019

MARTHE: Scheduling the Learning Rate Via Online Hypergradients

arXiv:1910.08525v413 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses the challenge of hyperparameter optimization for learning rates in machine learning, offering a novel method that combines and improves upon existing techniques to enhance training stability and generalization.

The paper tackles the problem of automatically fitting task-specific learning rate schedules for better generalization by introducing MARTHE, an online algorithm that uses cheap hypergradient approximations and past optimization information to simulate future behavior, resulting in more stable schedules and improved model generalization.

We study the problem of fitting task-specific learning rate schedules from the perspective of hyperparameter optimization, aiming at good generalization. We describe the structure of the gradient of a validation error w.r.t. the learning rate schedule -- the hypergradient. Based on this, we introduce MARTHE, a novel online algorithm guided by cheap approximations of the hypergradient that uses past information from the optimization trajectory to simulate future behaviour. It interpolates between two recent techniques, RTHO (Franceschi et al., 2017) and HD (Baydin et al. 2018), and is able to produce learning rate schedules that are more stable leading to models that generalize better.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes