OC LGApr 3, 2024

Stochastic Constrained Decentralized Optimization for Machine Learning with Fewer Data Oracles: a Gradient Sliding Approach

arXiv:2404.02511v15.63 citationsh-index: 11

Originality Incremental advance

AI Analysis

This work addresses communication and privacy challenges for users in decentralized applications, offering a method with broader applicability than prior work by relaxing assumptions about solution interiors.

The paper tackles the problem of communication efficiency and privacy in decentralized machine learning by proposing a primal-dual sliding with conditional gradient sliding framework, achieving optimal gradient complexities of O(1/√ε + σ²/ε²) for convex and O(log(1/ε) + σ²/ε) for strongly convex settings with a stochastic gradient oracle.

In modern decentralized applications, ensuring communication efficiency and privacy for the users are the key challenges. In order to train machine-learning models, the algorithm has to communicate to the data center and sample data for its gradient computation, thus exposing the data and increasing the communication cost. This gives rise to the need for a decentralized optimization algorithm that is communication-efficient and minimizes the number of gradient computations. To this end, we propose the primal-dual sliding with conditional gradient sliding framework, which is communication-efficient and achieves an $\varepsilon$-approximate solution with the optimal gradient complexity of $O(1/\sqrt{\varepsilon}+σ^2/{\varepsilon^2})$ and $O(\log(1/\varepsilon)+σ^2/\varepsilon)$ for the convex and strongly convex setting respectively and an LO (Linear Optimization) complexity of $O(1/\varepsilon^2)$ for both settings given a stochastic gradient oracle with variance $σ^2$. Compared with the prior work \cite{wai-fw-2017}, our framework relaxes the assumption of the optimal solution being a strict interior point of the feasible set and enjoys wider applicability for large-scale training using a stochastic gradient oracle. We also demonstrate the efficiency of our algorithms with various numerical experiments.

View on arXiv PDF

Similar