Privacy Auditing of Multi-domain Graph Pre-trained Model under Membership Inference Attacks
This work addresses privacy risks for users of multi-domain graph pre-trained models, which are increasingly used in graph foundation models, but it is incremental as it adapts existing MIA techniques to a specific domain.
The paper tackles the challenge of conducting membership inference attacks (MIAs) on multi-domain graph pre-trained models, which are difficult to attack due to reduced overfitting and diverse training data, and proposes MGP-MIA, a framework that amplifies membership signals, constructs shadow models incrementally, and uses similarity-based inference, achieving effective attacks as demonstrated in experiments.
Multi-domain graph pre-training has emerged as a pivotal technique in developing graph foundation models. While it greatly improves the generalization of graph neural networks, its privacy risks under membership inference attacks (MIAs), which aim to identify whether a specific instance was used in training (member), remain largely unexplored. However, effectively conducting MIAs against multi-domain graph pre-trained models is a significant challenge due to: (i) Enhanced Generalization Capability: Multi-domain pre-training reduces the overfitting characteristics commonly exploited by MIAs. (ii) Unrepresentative Shadow Datasets: Diverse training graphs hinder the obtaining of reliable shadow graphs. (iii) Weakened Membership Signals: Embedding-based outputs offer less informative cues than logits for MIAs. To tackle these challenges, we propose MGP-MIA, a novel framework for Membership Inference Attacks against Multi-domain Graph Pre-trained models. Specifically, we first propose a membership signal amplification mechanism that amplifies the overfitting characteristics of target models via machine unlearning. We then design an incremental shadow model construction mechanism that builds a reliable shadow model with limited shadow graphs via incremental learning. Finally, we introduce a similarity-based inference mechanism that identifies members based on their similarity to positive and negative samples. Extensive experiments demonstrate the effectiveness of our proposed MGP-MIA and reveal the privacy risks of multi-domain graph pre-training.