LG CRJun 3

DPDL: Towards Differential Privacy Preservation in Decentralized Stochastic Learning on Non-IID Data

Yunsheng Yuan, Xue Xiao, Lina Wang, Feng Li

arXiv:2606.0439968.8

Predicted impact top 21% in LG · last 90 daysOriginality Incremental advance

AI Analysis

For decentralized learning systems with non-IID data, DPDL provides a privacy-preserving method that maintains training efficiency, addressing a key bottleneck in real-world collaborative learning.

DPDL introduces differential privacy into decentralized learning on non-IID data via similarity-based calibration of perturbed cross-gradients, achieving linear speedup in training while defending against privacy attacks.

In the paradigm of decentralized learning, a group of agents collaborate to train a global model using distributed datasets without a central server. Although the power of collaboration has been verified by many state-of-the-art studies, it entails extensive gradient information exchanging among the agents and thus induces high risk of privacy leakage for the individual agents. Moreover, in real-world applications, the training data are usually non-identically and independently distributed across the agents, inducing more challenges to enable privacy-preserved decentralized learning. To address these issues, we propose a privacy-preserved decentralized learning algorithm with non-IID data, DPDL, which leverages the notion of Differential Privacy (DP) in cross-gradient aggregation through a similarity-based calibration technique. Specifically, in each round, each agent perturbs the cross-gradients (i.e., the derivatives of its neighbors' local model in its private local data) by Gaussian noise mechanism before sharing them with its neighbors; it then adopt cosine similarity to calibrate the received perturbed cross-gradients such that the aggregation of the calibrated cross-gradients can be utilized to effectively update local model in a momentum-like manner. Our rigorous theoretical analysis not only reveals the minimum noise level required to achieve a specific level of privacy preservation, but also illustrates that our algorithm still achieves a linear speedup in training with non-IID data. We finally conduct extensive experiments on real-world dataset to validate the effectiveness of our algorithm in defending privacy attacks and in training accurate models.

View on arXiv PDF

Similar