LG AI CR DS MLMay 25, 2019

Enhancing Adversarial Defense by k-Winners-Take-All

arXiv:1905.10510v324.2114 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses adversarial vulnerability in neural networks, offering a simple, drop-in solution for improved defense, though it appears incremental as it modifies existing activation functions rather than introducing a new paradigm.

The authors tackled the problem of defending neural networks against gradient-based adversarial attacks by replacing standard activation functions with k-Winners-Take-All (k-WTA), a discontinuous function that invalidates gradients at many input points. They found that k-WTA networks consistently outperformed traditional networks in robustness under white-box attacks across various structures and training methods.

We propose a simple change to existing neural network structures for better defending against gradient-based adversarial attacks. Instead of using popular activation functions (such as ReLU), we advocate the use of k-Winners-Take-All (k-WTA) activation, a C0 discontinuous function that purposely invalidates the neural network model's gradient at densely distributed input data points. The proposed k-WTA activation can be readily used in nearly all existing networks and training methods with no significant overhead. Our proposal is theoretically rationalized. We analyze why the discontinuities in k-WTA networks can largely prevent gradient-based search of adversarial examples and why they at the same time remain innocuous to the network training. This understanding is also empirically backed. We test k-WTA activation on various network structures optimized by a training method, be it adversarial training or not. In all cases, the robustness of k-WTA networks outperforms that of traditional networks under white-box attacks.

View on arXiv PDF Code

Similar