LG MLOct 23, 2023

Studying K-FAC Heuristics by Viewing Adam through a Second-Order Lens

Ross M. Clarke, José Miguel Hernández-Lobato

arXiv:2310.14963v36.63 citationsh-index: 10Has Code

Originality Incremental advance

AI Analysis

This work addresses the optimization efficiency gap in deep learning for researchers, but it is incremental as it builds on existing methods like Adam and K-FAC.

The paper tackled the problem of understanding the contribution of stabilising heuristics versus curvature models in second-order optimization methods for deep learning, by introducing AdamQLR, which combines K-FAC heuristics with Adam updates, and found that untuned AdamQLR achieved comparable performance to tuned benchmarks.

Research into optimisation for deep learning is characterised by a tension between the computational efficiency of first-order, gradient-based methods (such as SGD and Adam) and the theoretical efficiency of second-order, curvature-based methods (such as quasi-Newton methods and K-FAC). Noting that second-order methods often only function effectively with the addition of stabilising heuristics (such as Levenberg-Marquardt damping), we ask how much these (as opposed to the second-order curvature model) contribute to second-order algorithms' performance. We thus study AdamQLR: an optimiser combining damping and learning rate selection techniques from K-FAC (Martens & Grosse, 2015) with the update directions proposed by Adam, inspired by considering Adam through a second-order lens. We evaluate AdamQLR on a range of regression and classification tasks at various scales and hyperparameter tuning methodologies, concluding K-FAC's adaptive heuristics are of variable standalone general effectiveness, and finding an untuned AdamQLR setting can achieve comparable performance vs runtime to tuned benchmarks.

View on arXiv PDF Code

Similar