Accelerated SGD for Non-Strongly-Convex Least Squares
This provides an optimal solution for a fundamental machine learning optimization problem, though it appears incremental as a modification of accelerated gradient descent.
The paper tackles the problem of stochastic approximation for non-strongly convex least squares regression, presenting a practical algorithm that achieves optimal prediction error rates of O(d/t) for noise dependence and accelerates initial condition forgetting to O(d/t^2).
We consider stochastic approximation for the least squares regression problem in the non-strongly convex setting. We present the first practical algorithm that achieves the optimal prediction error rates in terms of dependence on the noise of the problem, as $O(d/t)$ while accelerating the forgetting of the initial conditions to $O(d/t^2)$. Our new algorithm is based on a simple modification of the accelerated gradient descent. We provide convergence results for both the averaged and the last iterate of the algorithm. In order to describe the tightness of these new bounds, we present a matching lower bound in the noiseless setting and thus show the optimality of our algorithm.