Adaptive proximal gradient methods are universal without approximation
This work provides theoretical guarantees for optimization methods in machine learning under broader conditions, though it appears incremental as it extends existing adaptive schemes to more general settings.
The authors tackled the problem of adaptive proximal gradient methods requiring restrictive Lipschitz assumptions by proving convergence under weaker local Hölder gradient continuity, covering functions like continuously differentiable semi-algebraic ones. They achieved this without approximation or linesearch procedures, and demonstrated numerical comparisons on machine learning tasks.
We show that adaptive proximal gradient methods for convex problems are not restricted to traditional Lipschitzian assumptions. Our analysis reveals that a class of linesearch-free methods is still convergent under mere local Hölder gradient continuity, covering in particular continuously differentiable semi-algebraic functions. To mitigate the lack of local Lipschitz continuity, popular approaches revolve around $\varepsilon$-oracles and/or linesearch procedures. In contrast, we exploit plain Hölder inequalities not entailing any approximation, all while retaining the linesearch-free nature of adaptive schemes. Furthermore, we prove full sequence convergence without prior knowledge of local Hölder constants nor of the order of Hölder continuity. Numerical experiments make comparisons with baseline methods on diverse tasks from machine learning covering both the locally and the globally Hölder setting.