LG MLApr 23, 2019

Semi-Cyclic Stochastic Gradient Descent

Hubert Eichner, Tomer Koren, H. Brendan McMahan, Nathan Srebro, Kunal Talwar

arXiv:1904.10120v123.0117 citationsh-index: 55

Originality Synthesis-oriented

AI Analysis

This addresses optimization challenges in distributed settings like Federated Learning, but appears incremental as it adapts existing SGD theory to a specific cyclic structure.

The paper tackles the performance degradation of SGD under block-cyclic sampling, such as in Federated Learning, and proposes a simple method to achieve performance guarantees equivalent to i.i.d. sampling.

We consider convex SGD updates with a block-cyclic structure, i.e. where each cycle consists of a small number of blocks, each with many samples from a possibly different, block-specific, distribution. This situation arises, e.g., in Federated Learning where the mobile devices available for updates at different times during the day have different characteristics. We show that such block-cyclic structure can significantly deteriorate the performance of SGD, but propose a simple approach that allows prediction with the same performance guarantees as for i.i.d., non-cyclic, sampling.

View on arXiv PDF

Similar