Stochastic gradient descent algorithms for strongly convex functions at O(1/T) convergence rates
This work provides an incremental improvement in optimization algorithms for machine learning practitioners dealing with strongly convex problems.
The paper tackled the problem of improving convergence rates for stochastic gradient descent on strongly convex functions, achieving a high probability convergence rate of O(κ/T) with a weighting scheme, instead of the previous O(κ ln(T)/T).
With a weighting scheme proportional to t, a traditional stochastic gradient descent (SGD) algorithm achieves a high probability convergence rate of O(κ/T) for strongly convex functions, instead of O(κ ln(T)/T). We also prove that an accelerated SGD algorithm also achieves a rate of O(κ/T).