Learning Privacy Preserving Encodings through Adversarial Training
This addresses privacy concerns in image data sharing by providing a robust encoding framework, though it is incremental as it builds on adversarial optimization techniques.
The authors tackled the problem of learning privacy-preserving image encodings that prevent inference of private attributes while preserving utility, achieving encoders resilient to post-hoc classifier attacks through a stable adversarial training method.
We present a framework to learn privacy-preserving encodings of images that inhibit inference of chosen private attributes, while allowing recovery of other desirable information. Rather than simply inhibiting a given fixed pre-trained estimator, our goal is that an estimator be unable to learn to accurately predict the private attributes even with knowledge of the encoding function. We use a natural adversarial optimization-based formulation for this---training the encoding function against a classifier for the private attribute, with both modeled as deep neural networks. The key contribution of our work is a stable and convergent optimization approach that is successful at learning an encoder with our desired properties---maintaining utility while inhibiting inference of private attributes, not just within the adversarial optimization, but also by classifiers that are trained after the encoder is fixed. We adopt a rigorous experimental protocol for verification wherein classifiers are trained exhaustively till saturation on the fixed encoders. We evaluate our approach on tasks of real-world complexity---learning high-dimensional encodings that inhibit detection of different scene categories---and find that it yields encoders that are resilient at maintaining privacy.