On the Importance of Conditioning for Privacy-Preserving Data Augmentation
This work addresses privacy risks in AI for data augmentation, revealing a critical vulnerability in existing methods, making it incremental by exposing flaws in prior approaches.
The paper tackles the problem of using conditioned latent diffusion models for privacy-preserving data augmentation, showing that such models are not suitable for anonymization because they can be exploited by contrastive learning and black-box attacks to identify individuals.
Latent diffusion models can be used as a powerful augmentation method to artificially extend datasets for enhanced training. To the human eye, these augmented images look very different to the originals. Previous work has suggested to use this data augmentation technique for data anonymization. However, we show that latent diffusion models that are conditioned on features like depth maps or edges to guide the diffusion process are not suitable as a privacy preserving method. We use a contrastive learning approach to train a model that can correctly identify people out of a pool of candidates. Moreover, we demonstrate that anonymization using conditioned diffusion models is susceptible to black box attacks. We attribute the success of the described methods to the conditioning of the latent diffusion model in the anonymization process. The diffusion model is instructed to produce similar edges for the anonymized images. Hence, a model can learn to recognize these patterns for identification.