OC LG MLNov 22, 2018

Markov Chain Block Coordinate Descent

Tao Sun, Yuejiao Sun, Yangyang Xu, Wotao Yin

arXiv:1811.08990v120.719 citations

Originality Incremental advance

AI Analysis

This addresses optimization challenges in distributed systems and Markov decision processes where traditional block selection methods are impractical or costly.

The paper tackles the problem of block coordinate descent optimization when blocks are selected according to a Markov chain rather than i.i.d. random or cyclic schemes, proving convergence for nonconvex Lipschitz differentiable functions and establishing sublinear and linear convergence rates for convex and strongly convex cases.

The method of block coordinate gradient descent (BCD) has been a powerful method for large-scale optimization. This paper considers the BCD method that successively updates a series of blocks selected according to a Markov chain. This kind of block selection is neither i.i.d. random nor cyclic. On the other hand, it is a natural choice for some applications in distributed optimization and Markov decision process, where i.i.d. random and cyclic selections are either infeasible or very expensive. By applying mixing-time properties of a Markov chain, we prove convergence of Markov chain BCD for minimizing Lipschitz differentiable functions, which can be nonconvex. When the functions are convex and strongly convex, we establish both sublinear and linear convergence rates, respectively. We also present a method of Markov chain inertial BCD. Finally, we discuss potential applications.

View on arXiv PDF

Similar