Chia-Yuan Wu

LGSep 14, 2024

Using Synthetic Data to Mitigate Unfairness and Preserve Privacy in Collaborative Machine Learning

Chia-Yuan Wu, Frank E. Curtis, Daniel P. Robinson

In distributed computing environments, collaborative machine learning enables multiple clients to train a global model collaboratively. To preserve privacy in such settings, a common technique is to utilize frequent updates and transmissions of model parameters. However, this results in high communication costs between the clients and the server. To tackle unfairness concerns in distributed environments, client-specific information (e.g., local dataset size or data-related fairness metrics) must be sent to the server to compute algorithmic quantities (e.g., aggregation weights), which leads to a potential leakage of client information. To address these challenges, we propose a two-stage strategy that promotes fair predictions, prevents client-data leakage, and reduces communication costs in certain scenarios without the need to pass information between clients and server iteratively. In the first stage, for each client, we use its local dataset to obtain a synthetic dataset by solving a bilevel optimization problem that aims to ensure that the ultimate global model yields fair predictions. In the second stage, we apply a method with differential privacy guarantees to the synthetic dataset from the first stage to obtain a second synthetic data. We then pass each client's second-stage synthetic dataset to the server, the collection of which is used to train the server model using conventional machine learning techniques (that no longer need to take fairness metrics or privacy into account). Thus, we eliminate the need to handle fairness-specific aggregation weights while preserving client privacy. Our approach requires only a single communication between the clients and the server (thus making it communication cost-effective), maintains data privacy, and promotes fairness. We present empirical evidence to demonstrate the advantages of our approach.

43.9LGMay 9

Robust Server Defense Against Unreliable Clients in One-Shot Fair Collaborative Machine Learning

Chia-Yuan Wu, Frank E. Curtis, Daniel P. Robinson

Collaborative machine learning (CML) enables multiple clients to train a global model jointly in a data-distributed setting. To address data privacy and communication efficiency, one-shot CML has been increasingly adopted, where clients communicate with the server only once by sharing synthetic or processed proxy data. This single-round communication, however, eliminates the possibility of iterative correction at the server, making the learning process particularly vulnerable to client unreliability. In this setting, unreliable clients, whether malicious or non-malicious, may provide biased proxy data that favors certain groups, thereby degrading the fairness of the global model and harming minority or unprivileged groups. In this work, we propose a server-side defense framework based on a bilevel optimization formulation. The proposed approach learns client-level weights to mitigate the influence of biased client proxy data while enforcing fairness constraints by using a very small trusted root dataset available at the server. Experimental results on benchmark datasets show that our method improves fairness with little accuracy loss under biased proxy data contributions from unreliable clients. Moreover, the proposed approach remains effective even when unreliable clients make up a majority of the system, consistently outperforming other existing methods.

Chia-Yuan Wu

2 Papers