An Efficient Membership Inference Attack for the Diffusion Model by Proximal Initialization
This work addresses privacy risks for users of diffusion models, particularly in text-to-speech applications, but is incremental as it builds on existing MIA methods.
The paper tackles privacy issues in diffusion models by proposing an efficient membership inference attack called Proximal Initialization Attack (PIA), which achieves competitive performance with only two queries on image and audio tasks, and finds that models with mel-spectrogram output are vulnerable while those with audio output are robust.
Recently, diffusion models have achieved remarkable success in generating tasks, including image and audio generation. However, like other generative models, diffusion models are prone to privacy issues. In this paper, we propose an efficient query-based membership inference attack (MIA), namely Proximal Initialization Attack (PIA), which utilizes groundtruth trajectory obtained by $ε$ initialized in $t=0$ and predicted point to infer memberships. Experimental results indicate that the proposed method can achieve competitive performance with only two queries on both discrete-time and continuous-time diffusion models. Moreover, previous works on the privacy of diffusion models have focused on vision tasks without considering audio tasks. Therefore, we also explore the robustness of diffusion models to MIA in the text-to-speech (TTS) task, which is an audio generation task. To the best of our knowledge, this work is the first to study the robustness of diffusion models to MIA in the TTS task. Experimental results indicate that models with mel-spectrogram (image-like) output are vulnerable to MIA, while models with audio output are relatively robust to MIA. {Code is available at \url{https://github.com/kong13661/PIA}}.