LGDec 6, 2022
Straggler-Resilient Differentially-Private Decentralized LearningYauhen Yakimenka, Chung-Wei Weng, Hsuan-Yin Lin et al.
We consider the straggler problem in decentralized learning over a logical ring while preserving user data privacy. Especially, we extend the recently proposed framework of differential privacy (DP) amplification by decentralization by Cyffers and Bellet to include overall training latency--comprising both computation and communication latency. Analytical results on both the convergence speed and the DP level are derived for both a skipping scheme (which ignores the stragglers after a timeout) and a baseline scheme that waits for each node to finish before the training continues. A trade-off between overall training latency, accuracy, and privacy, parameterized by the timeout of the skipping scheme, is identified and empirically validated for logistic regression on a real-world dataset and for image classification using the MNIST and CIFAR-10 datasets.
LGNov 11, 2024
Differentially-Private Collaborative Online Personalized Mean EstimationYauhen Yakimenka, Chung-Wei Weng, Hsuan-Yin Lin et al.
We consider the problem of collaborative personalized mean estimation under a privacy constraint in an environment of several agents continuously receiving data according to arbitrary unknown agent-specific distributions. In particular, we provide a method based on hypothesis testing coupled with differential privacy and data variance estimation. Two privacy mechanisms and two data variance estimation schemes are proposed, and we provide a theoretical convergence analysis of the proposed algorithm for any bounded unknown distributions on the agents' data, showing that collaboration provides faster convergence than a fully local approach where agents do not share data. Moreover, we provide analytical performance curves for the case with an oracle class estimator, i.e., the class structure of the agents, where agents receiving data from distributions with the same mean are considered to be in the same class, is known. The theoretical faster-than-local convergence guarantee is backed up by extensive numerical results showing that for a considered scenario the proposed approach indeed converges much faster than a fully local approach, and performs comparably to ideal performance where all data is public. This illustrates the benefit of private collaboration in an online setting.
LGDec 7, 2020
Generative Adversarial User Privacy in Lossy Single-Server Information RetrievalChung-Wei Weng, Yauhen Yakimenka, Hsuan-Yin Lin et al.
We propose to extend the concept of private information retrieval by allowing for distortion in the retrieval process and relaxing the perfect privacy requirement at the same time. In particular, we study the trade-off between download rate, distortion, and user privacy leakage, and show that in the limit of large file sizes this trade-off can be captured via a novel information-theoretical formulation for datasets with a known distribution. Moreover, for scenarios where the statistics of the dataset is unknown, we propose a new deep learning framework by leveraging a generative adversarial network approach, which allows the user to learn efficient schemes from the data itself. We evaluate the performance of the scheme on a synthetic Gaussian dataset as well as on the MNIST, CIFAR-10, and LSUN datasets. For the MNIST, CIFAR-10, and LSUN datasets, the data-driven approach significantly outperforms a nonlearning-based scheme which combines source coding with the download of multiple files.