LG DC OCAug 17, 2023

Distributed Extra-gradient with Optimal Complexity and Communication Guarantees

Ali Ramezani-Kebrya, Kimon Antonakopoulos, Igor Krawczuk, Justin Deschenaux, Volkan Cevher

arXiv:2308.09187v16.64 citationsh-index: 60Has Code

Originality Incremental advance

AI Analysis

This work addresses communication bottlenecks in distributed optimization for machine learning practitioners, offering an incremental improvement with adaptive compression and step-size rules tailored to variational inequalities.

The paper tackles the problem of communication inefficiency in distributed monotone variational inequality (VI) problems, such as min-max and game scenarios, by proposing a quantized generalized extra-gradient method (Q-GenX) that achieves optimal convergence rates of O(1/T) under relative noise and O(1/√T) under absolute noise, with validation through real-world experiments including training generative adversarial networks on multiple GPUs.

We consider monotone variational inequality (VI) problems in multi-GPU settings where multiple processors/workers/clients have access to local stochastic dual vectors. This setting includes a broad range of important problems from distributed convex minimization to min-max and games. Extra-gradient, which is a de facto algorithm for monotone VI problems, has not been designed to be communication-efficient. To this end, we propose a quantized generalized extra-gradient (Q-GenX), which is an unbiased and adaptive compression method tailored to solve VIs. We provide an adaptive step-size rule, which adapts to the respective noise profiles at hand and achieve a fast rate of ${\mathcal O}(1/T)$ under relative noise, and an order-optimal ${\mathcal O}(1/\sqrt{T})$ under absolute noise and show distributed training accelerates convergence. Finally, we validate our theoretical results by providing real-world experiments and training generative adversarial networks on multiple GPUs.

View on arXiv PDF Code

Similar