Block-Cyclic Stochastic Coordinate Descent for Deep Neural Networks
This addresses optimization challenges in deep neural networks for practitioners, but it appears incremental as it builds on existing stochastic block-coordinate descent methods.
The paper tackles the problem of outliers in training data degrading optimization performance by introducing BCSC, a stochastic first-order algorithm that adds cyclic constraints to block-coordinate descent, and it shows empirical improvements in accuracy and convergence speed on benchmark datasets.
We present a stochastic first-order optimization algorithm, named BCSC, that adds a cyclic constraint to stochastic block-coordinate descent. It uses different subsets of the data to update different subsets of the parameters, thus limiting the detrimental effect of outliers in the training set. Empirical tests in benchmark datasets show that our algorithm outperforms state-of-the-art optimization methods in both accuracy as well as convergence speed. The improvements are consistent across different architectures, and can be combined with other training techniques and regularization methods.