CVJul 16, 2024

Model Inversion Attacks Through Target-Specific Conditional Diffusion Models

Ouxiang Li, Yanbin Hao, Zhicai Wang, Bin Zhu, Shuo Wang, Zaixi Zhang, Fuli Feng

arXiv:2407.11424v28.710 citationsh-index: 8Has Code

Originality Incremental advance

AI Analysis

This addresses privacy concerns in AI by improving attack fidelity, but it is incremental as it builds on existing diffusion models for a specific security domain.

The paper tackles model inversion attacks by proposing Diff-MI, a diffusion-based method that reconstructs private images from a target classifier, achieving an average 20% decrease in FID while maintaining competitive attack accuracy.

Model inversion attacks (MIAs) aim to reconstruct private images from a target classifier's training set, thereby raising privacy concerns in AI applications. Previous GAN-based MIAs tend to suffer from inferior generative fidelity due to GAN's inherent flaws and biased optimization within latent space. To alleviate these issues, leveraging on diffusion models' remarkable synthesis capabilities, we propose Diffusion-based Model Inversion (Diff-MI) attacks. Specifically, we introduce a novel target-specific conditional diffusion model (CDM) to purposely approximate target classifier's private distribution and achieve superior accuracy-fidelity balance. Our method involves a two-step learning paradigm. Step-1 incorporates the target classifier into the entire CDM learning under a pretrain-then-finetune fashion, with creating pseudo-labels as model conditions in pretraining and adjusting specified layers with image predictions in fine-tuning. Step-2 presents an iterative image reconstruction method, further enhancing the attack performance through a combination of diffusion priors and target knowledge. Additionally, we propose an improved max-margin loss that replaces the hard max with top-k maxes, fully leveraging feature information and soft labels from the target classifier. Extensive experiments demonstrate that Diff-MI significantly improves generative fidelity with an average decrease of 20\% in FID while maintaining competitive attack accuracy compared to state-of-the-art methods across various datasets and models. Our code is available at: \url{https://github.com/Ouxiang-Li/Diff-MI}.

View on arXiv PDF Code

Similar