OCLGMLNov 22, 2018

Markov Chain Block Coordinate Descent

arXiv:1811.08990v119 citations
Originality Incremental advance
AI Analysis

This addresses optimization challenges in distributed systems and Markov decision processes where traditional block selection methods are impractical or costly.

The paper tackles the problem of block coordinate descent optimization when blocks are selected according to a Markov chain rather than i.i.d. random or cyclic schemes, proving convergence for nonconvex Lipschitz differentiable functions and establishing sublinear and linear convergence rates for convex and strongly convex cases.

The method of block coordinate gradient descent (BCD) has been a powerful method for large-scale optimization. This paper considers the BCD method that successively updates a series of blocks selected according to a Markov chain. This kind of block selection is neither i.i.d. random nor cyclic. On the other hand, it is a natural choice for some applications in distributed optimization and Markov decision process, where i.i.d. random and cyclic selections are either infeasible or very expensive. By applying mixing-time properties of a Markov chain, we prove convergence of Markov chain BCD for minimizing Lipschitz differentiable functions, which can be nonconvex. When the functions are convex and strongly convex, we establish both sublinear and linear convergence rates, respectively. We also present a method of Markov chain inertial BCD. Finally, we discuss potential applications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes