Privacy Preserving Machine Learning Workflow: from Anonymization to Personalized Differential Privacy Budgets in Federated Learning
For practitioners in federated learning, this work offers a method to balance privacy and utility by tailoring differential privacy budgets to individual clients, though the improvement is demonstrated on a single dataset and may be incremental.
This paper presents a privacy-preserving federated learning workflow for sensitive tabular data, introducing a formal definition for client drift and a methodology for assigning personalized differential privacy budgets based on re-identification risk. Experiments on a medical dataset show that personalized budgets improve model performance over fixed global budgets, as measured by two error metrics.
The growing development of artificial intelligence based solutions, together with privacy legislation, has driven the rise of the so-called privacy preserving machine learning architectures, such as federated learning. While federated learning enables model training on decentralized data preventing their sharing and centralization, it still faces several challenges related to data integrity and privacy. This paper presents a comprehensive privacy preserving federated learning workflow for sensitive tabular data, including anonymization and differential privacy techniques. We also introduce a formal definition for the concept of client drift, together with ways of detecting it to mitigate poisoning attacks. Then, we detail a complete methodology for assigning personalized privacy budgets for global differential privacy to the different clients participating in the network, based on a re-identification risk metric. The proposed methodology is presented and tested on an openly available dataset of medical records. Within the experimental setup we show that the approach based on personalized budgets, compared to the architecture including global differential privacy with fixed privacy budget, achieves a better model performance in terms of two error metrics.