OCAug 17, 2017
More Iterations per Second, Same Quality -- Why Asynchronous Algorithms may Drastically Outperform Traditional OnesRobert Hannah, Wotao Yin
In this paper, we consider the convergence of a very general asynchronous-parallel algorithm called ARock, that takes many well-known asynchronous algorithms as special cases (gradient descent, proximal gradient, Douglas Rachford, ADMM, etc.). In asynchronous-parallel algorithms, the computing nodes simply use the most recent information that they have access to, instead of waiting for a full update from all nodes in the system. This means that nodes do not have to waste time waiting for information, which can be a major bottleneck, especially in distributed systems. When the system has $p$ nodes, asynchronous algorithms may complete $Θ(\ln(p))$ more iterations than synchronous algorithms in a given time period ("more iterations per second"). Although asynchronous algorithms may compute more iterations per second, there is error associated with using outdated information. How many more iterations in total are needed to compensate for this error is still an open question. The main results of this paper aim to answer this question. We prove, loosely, that as the size of the problem becomes large, the number of additional iterations that asynchronous algorithms need becomes negligible compared to the total number ("same quality" of the iterations). Taking these facts together, our results provide solid evidence of the potential of asynchronous algorithms to vastly speed up certain distributed computations.
OCJul 18, 2017
Asynchronous Coordinate Descent under More Realistic AssumptionsTao Sun, Robert Hannah, Wotao Yin
Asynchronous-parallel algorithms have the potential to vastly speed up algorithms by eliminating costly synchronization. However, our understanding to these algorithms is limited because the current convergence of asynchronous (block) coordinate descent algorithms are based on somewhat unrealistic assumptions. In particular, the age of the shared optimization variables being used to update a block is assumed to be independent of the block being updated. Also, it is assumed that the updates are applied to randomly chosen blocks. In this paper, we argue that these assumptions either fail to hold or will imply less efficient implementations. We then prove the convergence of asynchronous-parallel block coordinate descent under more realistic assumptions, in particular, always without the independence assumption. The analysis permits both the deterministic (essentially) cyclic and random rules for block choices. Because a bound on the asynchronous delays may or may not be available, we establish convergence for both bounded delays and unbounded delays. The analysis also covers nonconvex, weakly convex, and strongly convex functions. We construct Lyapunov functions that directly model both objective progress and delays, so delays are not treated errors or noise. A continuous-time ODE is provided to explain the construction at a high level.
OCSep 15, 2016
On Unbounded Delays in Asynchronous Parallel Fixed-Point AlgorithmsRobert Hannah, Wotao Yin
The need for scalable numerical solutions has motivated the development of asynchronous parallel algorithms, where a set of nodes run in parallel with little or no synchronization, thus computing with delayed information. This paper studies the convergence of the asynchronous parallel algorithm ARock under potentially unbounded delays. ARock is a general asynchronous algorithm that has many applications. It parallelizes fixed-point iterations by letting a set of nodes randomly choose solution coordinates and update them in an asynchronous parallel fashion. ARock takes some recent asynchronous coordinate descent algorithms as special cases and gives rise to new asynchronous operator-splitting algorithms. Existing analysis of ARock assumes the delays to be bounded, and uses this bound to set a step size that is important to both convergence and efficiency. Other work, though allowing unbounded delays, imposes strict conditions on the underlying fixed-point operator, resulting in limited applications. In this paper, convergence is established under unbounded delays, which can be either stochastic or deterministic. The proposed step sizes are more practical and generally larger than those in the existing work. The step size adapts to the delay distribution or the current delay being experienced in the system. New Lyapunov functions, which are the key to analyzing asynchronous algorithms, are generated to obtain our results. A set of applicable optimization algorithms with large-scale applications are given, including machine learning and scientific computing algorithms.