Reconstructing Training Data from Model Gradient, Provably
This poses a severe threat to privacy in federated learning by exposing sensitive training data through a provable attack.
The paper tackles the problem of data privacy by showing that training samples can be fully reconstructed from a single gradient query at a random parameter value, even without training or memorization, using an efficient tensor decomposition algorithm.
Understanding when and how much a model gradient leaks information about the training sample is an important question in privacy. In this paper, we present a surprising result: even without training or memorizing the data, we can fully reconstruct the training samples from a single gradient query at a randomly chosen parameter value. We prove the identifiability of the training data under mild conditions: with shallow or deep neural networks and a wide range of activation functions. We also present a statistically and computationally efficient algorithm based on tensor decomposition to reconstruct the training data. As a provable attack that reveals sensitive training data, our findings suggest potential severe threats to privacy, especially in federated learning.