Forgetting Outside the Box: Scrubbing Deep Networks of Information Accessible from Input-Output Observations
This addresses the need for data privacy and model security in machine learning by enabling more effective forgetting of sensitive training data, though it builds incrementally on existing methods.
The paper tackles the problem of removing dependency on specific training data from a trained deep network, introducing a procedure that improves upon previous methods and generalizes to different readout functions, with a new bound on extractable information about the forgotten cohort from black-box observations.
We describe a procedure for removing dependency on a cohort of training data from a trained deep network that improves upon and generalizes previous methods to different readout functions and can be extended to ensure forgetting in the activations of the network. We introduce a new bound on how much information can be extracted per query about the forgotten cohort from a black-box network for which only the input-output behavior is observed. The proposed forgetting procedure has a deterministic part derived from the differential equations of a linearized version of the model, and a stochastic part that ensures information destruction by adding noise tailored to the geometry of the loss landscape. We exploit the connections between the activation and weight dynamics of a DNN inspired by Neural Tangent Kernels to compute the information in the activations.