Alleviating Privacy Attacks via Causal Learning
This work addresses privacy risks for users of black-box ML models, particularly when deployed on out-of-distribution data, by proposing a causal learning approach that offers stronger theoretical guarantees and empirical robustness.
The paper tackles the problem of privacy attacks like membership inference on machine learning models by showing that models based on causal relationships generalize better to unseen data distributions. Experiments on simulated Bayesian networks and colored-MNIST show that causal models reduce attack accuracy to near random guess levels, compared to up to 80% for associational models.
Machine learning models, especially deep neural networks have been shown to be susceptible to privacy attacks such as membership inference where an adversary can detect whether a data point was used for training a black-box model. Such privacy risks are exacerbated when a model's predictions are used on an unseen data distribution. To alleviate privacy attacks, we demonstrate the benefit of predictive models that are based on the causal relationships between input features and the outcome. We first show that models learnt using causal structure generalize better to unseen data, especially on data from different distributions than the train distribution. Based on this generalization property, we establish a theoretical link between causality and privacy: compared to associational models, causal models provide stronger differential privacy guarantees and are more robust to membership inference attacks. Experiments on simulated Bayesian networks and the colored-MNIST dataset show that associational models exhibit upto 80% attack accuracy under different test distributions and sample sizes whereas causal models exhibit attack accuracy close to a random guess.