LG CROct 28, 2022

Machine Unlearning of Federated Clusters

Chao Pan, Jin Sima, Saurav Prakash, Vishal Rana, Olgica Milenkovic

arXiv:2210.16424v216.144 citationsh-index: 48Has Code

Originality Incremental advance

AI Analysis

This addresses the need for data removal in federated clustering systems under privacy laws like the 'right to be forgotten', though it is incremental as it builds on existing federated learning and clustering methods.

The paper tackles the problem of machine unlearning for federated clustering, introducing an efficient unlearning mechanism that offers an average speed-up of roughly 84x compared to complete retraining across seven datasets.

Federated clustering (FC) is an unsupervised learning problem that arises in a number of practical applications, including personalized recommender and healthcare systems. With the adoption of recent laws ensuring the "right to be forgotten", the problem of machine unlearning for FC methods has become of significant importance. We introduce, for the first time, the problem of machine unlearning for FC, and propose an efficient unlearning mechanism for a customized secure FC framework. Our FC framework utilizes special initialization procedures that we show are well-suited for unlearning. To protect client data privacy, we develop the secure compressed multiset aggregation (SCMA) framework that addresses sparse secure federated learning (FL) problems encountered during clustering as well as more general problems. To simultaneously facilitate low communication complexity and secret sharing protocols, we integrate Reed-Solomon encoding with special evaluation points into our SCMA pipeline, and prove that the client communication cost is logarithmic in the vector dimension. Additionally, to demonstrate the benefits of our unlearning mechanism over complete retraining, we provide a theoretical analysis for the unlearning performance of our approach. Simulation results show that the new FC framework exhibits superior clustering performance compared to previously reported FC baselines when the cluster sizes are highly imbalanced. Compared to completely retraining K-means++ locally and globally for each removal request, our unlearning procedure offers an average speed-up of roughly 84x across seven datasets. Our implementation for the proposed method is available at https://github.com/thupchnsky/mufc.

View on arXiv PDF Code

Similar