Randomized Block Coordinate Descent for Online and Stochastic Optimization
This work addresses optimization efficiency for machine learning practitioners dealing with high-dimensional data and large sample sizes, though it is incremental as it hybridizes existing methods.
The paper tackles the problem of composite minimization for large-scale data by combining online/stochastic gradient descent with randomized coordinate descent, proposing Online Randomized Block Coordinate Descent (ORBCD) that computes partial gradients per iteration. It shows ORBCD achieves iteration complexity comparable to OGD/SGD and converges geometrically for strongly convex functions, matching rates of variance-reduced SGD and RBCD.
Two types of low cost-per-iteration gradient descent methods have been extensively studied in parallel. One is online or stochastic gradient descent (OGD/SGD), and the other is randomzied coordinate descent (RBCD). In this paper, we combine the two types of methods together and propose online randomized block coordinate descent (ORBCD). At each iteration, ORBCD only computes the partial gradient of one block coordinate of one mini-batch samples. ORBCD is well suited for the composite minimization problem where one function is the average of the losses of a large number of samples and the other is a simple regularizer defined on high dimensional variables. We show that the iteration complexity of ORBCD has the same order as OGD or SGD. For strongly convex functions, by reducing the variance of stochastic gradients, we show that ORBCD can converge at a geometric rate in expectation, matching the convergence rate of SGD with variance reduction and RBCD.