Beyond Communication Overhead: A Multilevel Monte Carlo Approach for Mitigating Compression Bias in Distributed Learning
This work addresses communication bottlenecks in distributed learning by mitigating compression bias, offering a practical solution for improving efficiency in large-scale machine learning systems.
The paper tackles the trade-off between biased and unbiased gradient compression in distributed learning by introducing a Multilevel Monte Carlo compression scheme that creates statistically unbiased estimates from biased compressors, resulting in enhanced variants of popular compressors like Top-k and bit-wise compressors with empirical validation on distributed deep learning tasks.
Distributed learning methods have gained substantial momentum in recent years, with communication overhead often emerging as a critical bottleneck. Gradient compression techniques alleviate communication costs but involve an inherent trade-off between the empirical efficiency of biased compressors and the theoretical guarantees of unbiased compressors. In this work, we introduce a novel Multilevel Monte Carlo (MLMC) compression scheme that leverages biased compressors to construct statistically unbiased estimates. This approach effectively bridges the gap between biased and unbiased methods, combining the strengths of both. To showcase the versatility of our method, we apply it to popular compressors, like Top-$k$ and bit-wise compressors, resulting in enhanced variants. Furthermore, we derive an adaptive version of our approach to further improve its performance. We validate our method empirically on distributed deep learning tasks.