LGJun 26, 2023

Gradient Descent Converges Linearly for Logistic Regression on Separable Data

arXiv:2306.14381v18 citationsh-index: 42
Originality Incremental advance
AI Analysis

This addresses the theoretical understanding of optimization methods for machine learning practitioners, providing a counterintuitive result that linear convergence is possible without strong convexity, which is incremental but impactful for specific models.

The paper tackles the convergence of gradient descent for logistic regression on separable data, showing that with a variable learning rate, the loss converges linearly to near-optimality, with error decaying exponentially in iterations and polynomially in solution magnitude, and applies this to sparse logistic regression for an exponential improvement in sparsity-error tradeoff.

We show that running gradient descent with variable learning rate guarantees loss $f(x) \leq 1.1 \cdot f(x^*) + ε$ for the logistic regression objective, where the error $ε$ decays exponentially with the number of iterations and polynomially with the magnitude of the entries of an arbitrary fixed solution $x^*$. This is in contrast to the common intuition that the absence of strong convexity precludes linear convergence of first-order methods, and highlights the importance of variable learning rates for gradient descent. We also apply our ideas to sparse logistic regression, where they lead to an exponential improvement of the sparsity-error tradeoff.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes