LG AIMay 15

Membership Inference Attacks on Discrete Diffusion Language Models

arXiv:2605.164454.8

Predicted impact top 64% in LG · last 90 daysOriginality Incremental advance

AI Analysis

This work reveals that discrete diffusion models are more vulnerable to membership inference than previously thought, posing a privacy risk for practitioners deploying fine-tuned MDLMs.

Membership inference attacks on fine-tuned Masked Diffusion Language Models (MDLMs) are shown to be significantly more effective than previous grey-box baselines, achieving a mean AUC of 0.878 across six text domains (peaking at 0.930 on Pile CC) and outperforming the SAMA baseline by 0.062 AUC. A shadow model transfer attack with K=3 surrogate models achieves 0.858 mean AUC, within 0.020 of the white-box oracle.

Masked Diffusion Language Models MDLMs replace autoregressive generation with iterative demasking and their privacy properties are largely unstudied. We study membership inference attacks MIA on fine tuned MDLMs and show they are significantly more vulnerable than current grey box baselines suggest. We extract a 46 dimensional feature vector from the models reconstruction loss at four masking ratios and train XGBoost and MLP classifiers on top. On the MIMIR benchmark across six text domains XGBoost achieves mean AUC 0.878 peaking at 0.930 on Pile CC and beats the SAMA grey box baseline by 0.062 AUC on average. A leave one signal out ablation shows that the ELBO trajectory alone drives most of this with a mean drop of 0.130 when removed while attention features add almost nothing below 0.003. We also design a shadow model transfer attack where K equals 3 surrogate MDLMs trained on data from unrelated domains generate classifier labels with no access to the target domain. This achieves 0.858 mean AUC within 0.020 of the white box oracle and establishes shadow model transfer as a practical and near equally effective attack path.

View on arXiv PDF

Similar