LGFeb 26

Mitigating Membership Inference in Intermediate Representations via Layer-wise MIA-risk-aware DP-SGD

Jiayang Meng, Tao Huang, Chen Hou, Guolong Zheng, Hong Chen

arXiv:2602.22611v11.4h-index: 7

Originality Incremental advance

AI Analysis

This work is significant for researchers and practitioners concerned with privacy leakage from intermediate representations in machine learning models, offering an incremental improvement to DP-SGD.

This paper addresses the problem of Membership Inference Attacks (MIAs) on Intermediate Representations (IRs) in pre-trained models, where existing DP-SGD methods apply uniform privacy protection. The authors propose Layer-wise MIA-risk-aware DP-SGD (LM-DP-SGD), which adaptively allocates privacy protection across layers based on their MIA risk, resulting in reduced peak IR-level MIA risk while preserving utility under the same privacy budget.

In Embedding-as-an-Interface (EaaI) settings, pre-trained models are queried for Intermediate Representations (IRs). The distributional properties of IRs can leak training-set membership signals, enabling Membership Inference Attacks (MIAs) whose strength varies across layers. Although Differentially Private Stochastic Gradient Descent (DP-SGD) mitigates such leakage, existing implementations employ per-example gradient clipping and a uniform, layer-agnostic noise multiplier, ignoring heterogeneous layer-wise MIA vulnerability. This paper introduces Layer-wise MIA-risk-aware DP-SGD (LM-DP-SGD), which adaptively allocates privacy protection across layers in proportion to their MIA risk. Specifically, LM-DP-SGD trains a shadow model on a public shadow dataset, extracts per-layer IRs from its train/test splits, and fits layer-specific MIA adversaries, using their attack error rates as MIA-risk estimates. Leveraging the cross-dataset transferability of MIAs, these estimates are then used to reweight each layer's contribution to the globally clipped gradient during private training, providing layer-appropriate protection under a fixed noise magnitude. We further establish theoretical guarantees on both privacy and convergence of LM-DP-SGD. Extensive experiments show that, under the same privacy budget, LM-DP-SGD reduces the peak IR-level MIA risk while preserving utility, yielding a superior privacy-utility trade-off.

View on arXiv PDF

Similar