FreeEagle: Detecting Complex Neural Trojans in Data-Free Cases
This addresses a critical security issue for AI model maintainers, such as those on model-sharing platforms, by enabling detection in data-free scenarios, representing a novel advancement beyond existing methods that rely on data access.
The paper tackles the problem of detecting complex neural trojans (backdoor attacks) in deep neural networks without access to clean or triggered data, proposing FreeEagle as a data-free method that effectively identifies such attacks and outperforms some state-of-the-art non-data-free detection methods.
Trojan attack on deep neural networks, also known as backdoor attack, is a typical threat to artificial intelligence. A trojaned neural network behaves normally with clean inputs. However, if the input contains a particular trigger, the trojaned model will have attacker-chosen abnormal behavior. Although many backdoor detection methods exist, most of them assume that the defender has access to a set of clean validation samples or samples with the trigger, which may not hold in some crucial real-world cases, e.g., the case where the defender is the maintainer of model-sharing platforms. Thus, in this paper, we propose FreeEagle, the first data-free backdoor detection method that can effectively detect complex backdoor attacks on deep neural networks, without relying on the access to any clean samples or samples with the trigger. The evaluation results on diverse datasets and model architectures show that FreeEagle is effective against various complex backdoor attacks, even outperforming some state-of-the-art non-data-free backdoor detection methods.