Deep Learning Backdoors
This addresses security vulnerabilities in AI systems, posing risks for users and developers reliant on DNNs.
The paper tackles the problem of backdoor attacks in Deep Neural Networks, where hidden malicious behaviors are injected to trigger predefined actions when specific inputs are present, such as objects or image filters.
Intuitively, a backdoor attack against Deep Neural Networks (DNNs) is to inject hidden malicious behaviors into DNNs such that the backdoor model behaves legitimately for benign inputs, yet invokes a predefined malicious behavior when its input contains a malicious trigger. The trigger can take a plethora of forms, including a special object present in the image (e.g., a yellow pad), a shape filled with custom textures (e.g., logos with particular colors) or even image-wide stylizations with special filters (e.g., images altered by Nashville or Gotham filters). These filters can be applied to the original image by replacing or perturbing a set of image pixels.