LG CRAug 6, 2020

Data Minimization for GDPR Compliance in Machine Learning Models

Abigail Goldsteen, Gilad Ezov, Ron Shmelkin, Micha Moffie, Ariel Farkash

arXiv:2008.04113v113.677 citationsHas Code

Originality Highly original

AI Analysis

It addresses GDPR compliance for machine learning model creators and users, offering a novel approach to a known bottleneck.

The paper tackles the problem of determining the minimal personal data required for predictions under GDPR by presenting a method to reduce input features with little to no accuracy loss, enabling provable data minimization.

The EU General Data Protection Regulation (GDPR) mandates the principle of data minimization, which requires that only data necessary to fulfill a certain purpose be collected. However, it can often be difficult to determine the minimal amount of data required, especially in complex machine learning models such as neural networks. We present a first-of-a-kind method to reduce the amount of personal data needed to perform predictions with a machine learning model, by removing or generalizing some of the input features. Our method makes use of the knowledge encoded within the model to produce a generalization that has little to no impact on its accuracy. This enables the creators and users of machine learning models to acheive data minimization, in a provable manner.

View on arXiv PDF Code

Similar