PASS: Private Attributes Protection with Stochastic Data Substitution
This addresses privacy risks in ML services for users by providing a more robust method against adversarial attacks, though it is incremental in improving upon existing techniques.
The paper tackles the problem of protecting private attributes in machine learning data while preserving utility, and proposes PASS, a stochastic substitution method that overcomes vulnerabilities in existing adversarial approaches, achieving effective protection across multiple datasets.
The growing Machine Learning (ML) services require extensive collections of user data, which may inadvertently include people's private information irrelevant to the services. Various studies have been proposed to protect private attributes by removing them from the data while maintaining the utilities of the data for downstream tasks. Nevertheless, as we theoretically and empirically show in the paper, these methods reveal severe vulnerability because of a common weakness rooted in their adversarial training based strategies. To overcome this limitation, we propose a novel approach, PASS, designed to stochastically substitute the original sample with another one according to certain probabilities, which is trained with a novel loss function soundly derived from information-theoretic objective defined for utility-preserving private attributes protection. The comprehensive evaluation of PASS on various datasets of different modalities, including facial images, human activity sensory signals, and voice recording datasets, substantiates PASS's effectiveness and generalizability.