Dirty and Clean-Label attack detection using GAN discriminators
This addresses the challenge of securing CV models against data poisoning attacks without manual inspection or retraining, though it is incremental as it builds on existing GAN-based methods.
The paper tackles the problem of detecting dirty-label and clean-label poisoning attacks in computer vision models by using GAN discriminators to protect a single class, achieving 100% detection of tested poison at a perturbation epsilon magnitude of 0.20 after threshold calibration.
Gathering enough images to train a deep computer vision model is a constant challenge. Unfortunately, collecting images from unknown sources can leave your model s behavior at risk of being manipulated by a dirty-label or clean-label attack unless the images are properly inspected. Manually inspecting each image-label pair is impractical and common poison-detection methods that involve re-training your model can be time consuming. This research uses GAN discriminators to protect a single class against mislabeled and different levels of modified images. The effect of said perturbation on a basic convolutional neural network classifier is also included for reference. The results suggest that after training on a single class, GAN discriminator s confidence scores can provide a threshold to identify mislabeled images and identify 100% of the tested poison starting at a perturbation epsilon magnitude of 0.20, after decision threshold calibration using in-class samples. Developers can use this report as a basis to train their own discriminators to protect high valued classes in their CV models.