Syfer: Neural Obfuscation for Private Data Release
This addresses privacy concerns for healthcare data release, offering a method that preserves utility better than existing approaches, though it appears incremental as it builds on neural network techniques for a specific domain.
The paper tackles the challenge of balancing privacy and predictive utility in healthcare machine learning by developing Syfer, a neural obfuscation method that protects against re-identification attacks while maintaining diagnostic prediction ability, achieving an average AUC of 0.78 compared to 0.53 for differentially private methods and 0.86 for original data.
Balancing privacy and predictive utility remains a central challenge for machine learning in healthcare. In this paper, we develop Syfer, a neural obfuscation method to protect against re-identification attacks. Syfer composes trained layers with random neural networks to encode the original data (e.g. X-rays) while maintaining the ability to predict diagnoses from the encoded data. The randomness in the encoder acts as the private key for the data owner. We quantify privacy as the number of attacker guesses required to re-identify a single image (guesswork). We propose a contrastive learning algorithm to estimate guesswork. We show empirically that differentially private methods, such as DP-Image, obtain privacy at a significant loss of utility. In contrast, Syfer achieves strong privacy while preserving utility. For example, X-ray classifiers built with DP-image, Syfer, and original data achieve average AUCs of 0.53, 0.78, and 0.86, respectively.