A Distributed Flexible Delay-tolerant Proximal Gradient Algorithm
This work addresses scalability challenges in distributed machine learning for researchers and practitioners dealing with heterogeneous and delayed systems, though it is incremental in adapting existing methods to more flexible settings.
The authors tackled the problem of distributed convex optimization with asynchronous communication and varying system conditions by developing a flexible proximal gradient algorithm that converges linearly for strongly convex objectives and provides convergence guarantees otherwise, achieving rates comparable to the vanilla proximal gradient algorithm while being scalable with stepsizes independent of delays or machine count.
We develop and analyze an asynchronous algorithm for distributed convex optimization when the objective writes a sum of smooth functions, local to each worker, and a non-smooth function. Unlike many existing methods, our distributed algorithm is adjustable to various levels of communication cost, delays, machines computational power, and functions smoothness. A unique feature is that the stepsizes do not depend on communication delays nor number of machines, which is highly desirable for scalability. We prove that the algorithm converges linearly in the strongly convex case, and provide guarantees of convergence for the non-strongly convex case. The obtained rates are the same as the vanilla proximal gradient algorithm over some introduced epoch sequence that subsumes the delays of the system. We provide numerical results on large-scale machine learning problems to demonstrate the merits of the proposed method.