On the Transferability of Adversarial Examples Against CNN-Based Image Forensics
This addresses security vulnerabilities in image forensics applications, but the findings are incremental as they confirm limited transferability in a specific domain.
The paper investigates whether adversarial examples transfer between CNN-based image forensic tools, specifically for manipulation detection, and finds that in most cases the attacks are not transferable, easing countermeasure design when the attacker lacks perfect knowledge of the target detector.
Recent studies have shown that Convolutional Neural Networks (CNN) are relatively easy to attack through the generation of so-called adversarial examples. Such vulnerability also affects CNN-based image forensic tools. Research in deep learning has shown that adversarial examples exhibit a certain degree of transferability, i.e., they maintain part of their effectiveness even against CNN models other than the one targeted by the attack. This is a very strong property undermining the usability of CNN's in security-oriented applications. In this paper, we investigate if attack transferability also holds in image forensics applications. With specific reference to the case of manipulation detection, we analyse the results of several experiments considering different sources of mismatch between the CNN used to build the adversarial examples and the one adopted by the forensic analyst. The analysis ranges from cases in which the mismatch involves only the training dataset, to cases in which the attacker and the forensic analyst adopt different architectures. The results of our experiments show that, in the majority of the cases, the attacks are not transferable, thus easing the design of proper countermeasures at least when the attacker does not have a perfect knowledge of the target detector.