Class Clown: Data Redaction in Machine Unlearning at Enterprise Scale
This work is significant for enterprises that train large DNNs on consumer data, as it provides a method to comply with data redaction requests without incurring the high cost and operational disruption of full model retraining.
This paper addresses the challenge of data redaction in large deep neural networks (DNNs) to comply with data privacy laws like GDPR and CCPA, which grant individuals the 'right to be forgotten'. The authors propose a DNN model lifecycle maintenance process that uses membership inference attacks to quantify privacy risk for each training data point and then implements data redaction by assigning incorrect labels during incremental model updates, thereby minimizing the need for full model retraining.
Individuals are gaining more control of their personal data through recent data privacy laws such the General Data Protection Regulation and the California Consumer Privacy Act. One aspect of these laws is the ability to request a business to delete private information, the so called "right to be forgotten" or "right to erasure". These laws have serious financial implications for companies and organizations that train large, highly accurate deep neural networks (DNNs) using these valuable consumer data sets. However, a received redaction request poses complex technical challenges on how to comply with the law while fulfilling core business operations. We introduce a DNN model lifecycle maintenance process that establishes how to handle specific data redaction requests and minimize the need to completely retrain the model. Our process is based upon the membership inference attack as a compliance tool for every point in the training set. These attack models quantify the privacy risk of all training data points and form the basis of follow-on data redaction from an accurate deployed model; excision is implemented through incorrect label assignment within incremental model updates.