Biologically-Informed Hybrid Membership Inference Attacks on Generative Genomic Models
This work addresses privacy concerns for genomics researchers and data handlers by improving attack methods, but it is incremental as it builds on existing membership inference techniques.
The paper tackled the privacy risks of using language models to generate synthetic genetic data by introducing a novel hybrid membership inference attack that combines traditional methods with biological metrics, resulting in higher adversarial success rates compared to existing attacks.
The increased availability of genetic data has transformed genomics research, but raised many privacy concerns regarding its handling due to its sensitive nature. This work explores the use of language models (LMs) for the generation of synthetic genetic mutation profiles, leveraging differential privacy (DP) for the protection of sensitive genetic data. We empirically evaluate the privacy guarantees of our DP modes by introducing a novel Biologically-Informed Hybrid Membership Inference Attack (biHMIA), which combines traditional black box MIA with contextual genomics metrics for enhanced attack power. Our experiments show that both small and large transformer GPT-like models are viable synthetic variant generators for small-scale genomics, and that our hybrid attack leads, on average, to higher adversarial success compared to traditional metric-based MIAs.