Disrupting Model Training with Adversarial Shortcuts
This addresses data privacy concerns for data owners by making publicly available data unusable for training effective ML models.
The paper tackles the problem of preventing unauthorized machine learning usage of publicly released data by proposing adversarial shortcuts that disrupt model training, demonstrating that these measures successfully prevent deep learning models from achieving high accuracy on real data examples.
When data is publicly released for human consumption, it is unclear how to prevent its unauthorized usage for machine learning purposes. Successful model training may be preventable with carefully designed dataset modifications, and we present a proof-of-concept approach for the image classification setting. We propose methods based on the notion of adversarial shortcuts, which encourage models to rely on non-robust signals rather than semantic features, and our experiments demonstrate that these measures successfully prevent deep learning models from achieving high accuracy on real, unmodified data examples.