DADAM: A Consensus-based Distributed Adaptive Gradient Method for Online Optimization
This addresses the problem of efficient distributed optimization for machine learning practitioners, but it is incremental as it builds on existing adaptive methods with a decentralized twist.
The paper tackles the high communication cost in parallelizing adaptive gradient methods like Adam for large-scale machine learning by proposing DADAM, a consensus-based distributed adaptive gradient method for online optimization over decentralized networks. The result is that DADAM can outperform centralized adaptive algorithms for certain loss functions, with empirical validation showing favorable performance compared to competing methods.
Adaptive gradient-based optimization methods such as \textsc{Adagrad}, \textsc{Rmsprop}, and \textsc{Adam} are widely used in solving large-scale machine learning problems including deep learning. A number of schemes have been proposed in the literature aiming at parallelizing them, based on communications of peripheral nodes with a central node, but incur high communications cost. To address this issue, we develop a novel consensus-based distributed adaptive moment estimation method (\textsc{Dadam}) for online optimization over a decentralized network that enables data parallelization, as well as decentralized computation. The method is particularly useful, since it can accommodate settings where access to local data is allowed. Further, as established theoretically in this work, it can outperform centralized adaptive algorithms, for certain classes of loss functions used in applications. We analyze the convergence properties of the proposed algorithm and provide a dynamic regret bound on the convergence rate of adaptive moment estimation methods in both stochastic and deterministic settings. Empirical results demonstrate that \textsc{Dadam} works also well in practice and compares favorably to competing online optimization methods.