Effectiveness of random deep feature selection for securing image manipulation detectors against adversarial examples
This work addresses security vulnerabilities in image manipulation detectors for forensic applications, but it is incremental as it builds on prior feature randomization methods.
The study investigated whether random feature selection can improve the robustness of deep learning-based image manipulation detectors against adversarial examples, finding that it hinders attack transferability across different detectors and tasks, though architecture changes or retraining sometimes suffice.
We investigate if the random feature selection approach proposed in [1] to improve the robustness of forensic detectors to targeted attacks, can be extended to detectors based on deep learning features. In particular, we study the transferability of adversarial examples targeting an original CNN image manipulation detector to other detectors (a fully connected neural network and a linear SVM) that rely on a random subset of the features extracted from the flatten layer of the original network. The results we got by considering three image manipulation detection tasks (resizing, median filtering and adaptive histogram equalization), two original network architectures and three classes of attacks, show that feature randomization helps to hinder attack transferability, even if, in some cases, simply changing the architecture of the detector, or even retraining the detector is enough to prevent the transferability of the attacks.