SafetyNet: Detecting and Rejecting Adversarial Examples Robustly
This addresses security vulnerabilities in AI systems for applications like image authentication, though it appears incremental as it builds on existing adversarial defense concepts.
The authors tackled the problem of adversarial examples in deep networks by introducing SafetyNet, a method that makes it difficult for attacks like DeepFool to succeed, and they applied it to SceneProof to detect real vs. fake images using RGBD data.
We describe a method to produce a network where current methods such as DeepFool have great difficulty producing adversarial samples. Our construction suggests some insights into how deep networks work. We provide a reasonable analyses that our construction is difficult to defeat, and show experimentally that our method is hard to defeat with both Type I and Type II attacks using several standard networks and datasets. This SafetyNet architecture is used to an important and novel application SceneProof, which can reliably detect whether an image is a picture of a real scene or not. SceneProof applies to images captured with depth maps (RGBD images) and checks if a pair of image and depth map is consistent. It relies on the relative difficulty of producing naturalistic depth maps for images in post processing. We demonstrate that our SafetyNet is robust to adversarial examples built from currently known attacking approaches.