On Reconstructing Training Data From Bayesian Posteriors and Trained Models
This addresses a major security problem for machine learning practitioners and users by exposing vulnerabilities in model releases, though it is incremental in extending attacks to Bayesian models.
The paper tackles the vulnerability of trained models to training data reconstruction attacks by establishing a mathematical framework and developing a score matching method for reconstructing data from Bayesian posteriors, which is the first such method in the literature.
Publicly releasing the specification of a model with its trained parameters means an adversary can attempt to reconstruct information about the training data via training data reconstruction attacks, a major vulnerability of modern machine learning methods. This paper makes three primary contributions: establishing a mathematical framework to express the problem, characterising the features of the training data that are vulnerable via a maximum mean discrepancy equivalance and outlining a score matching framework for reconstructing data in both Bayesian and non-Bayesian models, the former is a first in the literature.