LGCRCVNEMLJun 15, 2022

Reconstructing Training Data from Trained Neural Networks

arXiv:2206.07758v3179 citationsh-index: 72
Originality Highly original
AI Analysis

This reveals a privacy vulnerability for users of neural networks, as it enables attacks to expose sensitive training data.

The paper tackles the problem of neural networks memorizing training data by showing that a significant fraction of training samples can be reconstructed from a trained classifier's parameters, with negative implications for privacy.

Understanding to what extent neural networks memorize training data is an intriguing question with practical and theoretical implications. In this paper we show that in some cases a significant fraction of the training data can in fact be reconstructed from the parameters of a trained neural network classifier. We propose a novel reconstruction scheme that stems from recent theoretical results about the implicit bias in training neural networks with gradient-based methods. To the best of our knowledge, our results are the first to show that reconstructing a large portion of the actual training samples from a trained neural network classifier is generally possible. This has negative implications on privacy, as it can be used as an attack for revealing sensitive training data. We demonstrate our method for binary MLP classifiers on a few standard computer vision datasets.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes