LGOCJul 15, 2025

Gradient Descent on Logistic Regression: Do Large Step-Sizes Work with Data on the Sphere?

arXiv:2507.11228v12 citationsh-index: 19
Originality Synthesis-oriented
AI Analysis

This addresses a theoretical problem in optimization for machine learning practitioners, but it is incremental as it builds on prior work on convergence properties.

The paper investigates whether gradient descent on logistic regression converges globally with large step sizes when data points have equal magnitude, finding that it holds in one dimension but cycles can still occur in higher dimensions.

Gradient descent (GD) on logistic regression has many fascinating properties. When the dataset is linearly separable, it is known that the iterates converge in direction to the maximum-margin separator regardless of how large the step size is. In the non-separable case, however, it has been shown that GD can exhibit a cycling behaviour even when the step sizes is still below the stability threshold $2/λ$, where $λ$ is the largest eigenvalue of the Hessian at the solution. This short paper explores whether restricting the data to have equal magnitude is a sufficient condition for global convergence, under any step size below the stability threshold. We prove that this is true in a one dimensional space, but in higher dimensions cycling behaviour can still occur. We hope to inspire further studies on quantifying how common these cycles are in realistic datasets, as well as finding sufficient conditions to guarantee global convergence with large step sizes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes