Problem-dependent convergence bounds for randomized linear gradient compression
This work provides problem-dependent convergence bounds for gradient compression, offering insights for distributed optimization practitioners, though it is incremental in refining existing theoretical frameworks.
The paper investigates how gradient compression affects convergence in distributed non-convex stochastic optimization, showing that the iteration penalty depends on problem structure like low-rank properties, with bounds predicting reduced penalties compared to worst-case analyses.
In distributed optimization, the communication of model updates can be a performance bottleneck. Consequently, gradient compression has been proposed as a means of increasing optimization throughput. In general, due to information loss, compression introduces a penalty on the number of iterations needed to reach a solution. In this work, we investigate how the iteration penalty depends on the interaction between compression and problem structure, in the context of non-convex stochastic optimization. We focus on linear schemes, where compression and decompression can be modeled as multiplication with a random matrix. We consider several distributions of matrices, among them Haar-distributed orthogonal matrices and matrices with random Gaussian entries. We find that the impact of compression on convergence can be quantified in terms of a smoothness matrix associated with the objective function, using a norm defined by the compression scheme. The analysis reveals that in certain cases, compression performance is related to low-rank structure or other spectral properties of the problem and our bounds predict that the penalty introduced by compression is significantly reduced compared to worst-case bounds that only consider the compression level, ignoring problem data. We verify the theoretical findings experimentally, including fine-tuning an image classification model.