LGOct 12, 2023

When Machine Learning Models Leak: An Exploration of Synthetic Training Data

arXiv:2310.08775v35 citationsh-index: 6
Originality Incremental advance
AI Analysis

This addresses privacy concerns for individuals in machine learning models, but it is incremental as it builds on existing attack frameworks.

The paper tackles the problem of sensitive attribute inference attacks on a propensity-to-move classifier by exploring how using synthetic training data instead of original data affects attack success, finding that synthetic data can reduce but not eliminate inference risks.

We investigate an attack on a machine learning model that predicts whether a person or household will relocate in the next two years, i.e., a propensity-to-move classifier. The attack assumes that the attacker can query the model to obtain predictions and that the marginal distribution of the data on which the model was trained is publicly available. The attack also assumes that the attacker has obtained the values of non-sensitive attributes for a certain number of target individuals. The objective of the attack is to infer the values of sensitive attributes for these target individuals. We explore how replacing the original data with synthetic data when training the model impacts how successfully the attacker can infer sensitive attributes.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes