Learnability and Privacy Vulnerability are Entangled in a Few Critical Weights
This work addresses privacy preservation for neural network users, offering a more efficient alternative to full retraining, though it is incremental as it builds on existing membership inference attack defenses.
The paper tackles the problem of costly and utility-degrading privacy preservation in neural networks by identifying that privacy vulnerability is concentrated in a small fraction of critical weights, and proposes a method to rewind only those weights for fine-tuning, achieving improved resilience against membership inference attacks while maintaining utility.
Prior approaches for membership privacy preservation usually update or retrain all weights in neural networks, which is costly and can lead to unnecessary utility loss or even more serious misalignment in predictions between training data and non-training data. In this work, we observed three insights: i) privacy vulnerability exists in a very small fraction of weights; ii) however, most of those weights also critically impact utility performance; iii) the importance of weights stems from their locations rather than their values. According to these insights, to preserve privacy, we score critical weights, and instead of discarding those neurons, we rewind only the weights for fine-tuning. We show that, through extensive experiments, this mechanism exhibits outperforming resilience in most cases against Membership Inference Attacks while maintaining utility.