LGOCDec 10, 2020

A Study of Condition Numbers for First-Order Optimization

arXiv:2012.05782v20.0023 citations
AI Analysis65

This work addresses the problem of robust hyperparameter tuning for first-order optimization algorithms, which is crucial for practitioners in machine learning and optimization, by identifying limitations of current metrics and proposing more stable alternatives.

This paper investigates the stability of smoothness and strong convexity metrics, commonly used for tuning first-order optimization algorithms (FOA), under small perturbations to the objective function. It finds that these traditional metrics can be heavily impacted by arbitrarily small perturbations, leading to conservative tunings and convergence issues, despite the FOA's behavior remaining largely unaffected. The authors propose and study alternative, continuous metrics that are robust to such perturbations and provide their guaranteed convergence rates for Gradient Descent.

The study of first-order optimization algorithms (FOA) typically starts with assumptions on the objective functions, most commonly smoothness and strong convexity. These metrics are used to tune the hyperparameters of FOA. We introduce a class of perturbations quantified via a new norm, called *-norm. We show that adding a small perturbation to the objective function has an equivalently small impact on the behavior of any FOA, which suggests that it should have a minor impact on the tuning of the algorithm. However, we show that smoothness and strong convexity can be heavily impacted by arbitrarily small perturbations, leading to excessively conservative tunings and convergence issues. In view of these observations, we propose a notion of continuity of the metrics, which is essential for a robust tuning strategy. Since smoothness and strong convexity are not continuous, we propose a comprehensive study of existing alternative metrics which we prove to be continuous. We describe their mutual relations and provide their guaranteed convergence rates for the Gradient Descent algorithm accordingly tuned. Finally we discuss how our work impacts the theoretical understanding of FOA and their performances.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes