Enhancing Gradient-based Attacks with Symbolic Intervals
This addresses the need for more comprehensive robustness evaluation in machine learning security, though it is incremental as it builds on existing gradient-based attacks.
The paper tackles the problem of evaluating the robustness of adversarially trained neural networks against unknown attacks by introducing interval attacks, a technique that uses symbolic interval propagation to locate adversarial examples, resulting in finding 47% more violations than the state-of-the-art PGD attack on average.
Recent breakthroughs in defenses against adversarial examples, like adversarial training, make the neural networks robust against various classes of attackers (e.g., first-order gradient-based attacks). However, it is an open question whether the adversarially trained networks are truly robust under unknown attacks. In this paper, we present interval attacks, a new technique to find adversarial examples to evaluate the robustness of neural networks. Interval attacks leverage symbolic interval propagation, a bound propagation technique that can exploit a broader view around the current input to locate promising areas containing adversarial instances, which in turn can be searched with existing gradient-guided attacks. We can obtain such a broader view using sound bound propagation methods to track and over-approximate the errors of the network within given input ranges. Our results show that, on state-of-the-art adversarially trained networks, interval attack can find on average 47% relatively more violations than the state-of-the-art gradient-guided PGD attack.