When Not to Classify: Detection of Reverse Engineering Attacks on DNN Image Classifiers
This addresses security vulnerabilities for users of DNN image classifiers, but it is incremental as it builds on prior work.
The paper tackles the problem of detecting reverse engineering attacks on DNN image classifiers, extending an existing method (ADA) to ADA-RE, which successfully detects stealthy attacks before they can enable effective test-time evasion.
This paper addresses detection of a reverse engineering (RE) attack targeting a deep neural network (DNN) image classifier; by querying, RE's aim is to discover the classifier's decision rule. RE can enable test-time evasion attacks, which require knowledge of the classifier. Recently, we proposed a quite effective approach (ADA) to detect test-time evasion attacks. In this paper, we extend ADA to detect RE attacks (ADA-RE). We demonstrate our method is successful in detecting "stealthy" RE attacks before they learn enough to launch effective test-time evasion attacks.