Stochastic Local Winner-Takes-All Networks Enable Profound Adversarial Robustness
This addresses the critical issue of adversarial vulnerability in AI systems, offering a novel approach to enhance security for applications like image classification, though it is incremental as it builds on adversarial training methods.
The paper tackled the problem of adversarial robustness in neural networks by replacing ReLU activations with Stochastic Local Winner-Takes-All blocks, resulting in state-of-the-art robustness against adversarial attacks while maintaining high benign classification rates.
This work explores the potency of stochastic competition-based activations, namely Stochastic Local Winner-Takes-All (LWTA), against powerful (gradient-based) white-box and black-box adversarial attacks; we especially focus on Adversarial Training settings. In our work, we replace the conventional ReLU-based nonlinearities with blocks comprising locally and stochastically competing linear units. The output of each network layer now yields a sparse output, depending on the outcome of winner sampling in each block. We rely on the Variational Bayesian framework for training and inference; we incorporate conventional PGD-based adversarial training arguments to increase the overall adversarial robustness. As we experimentally show, the arising networks yield state-of-the-art robustness against powerful adversarial attacks while retaining very high classification rate in the benign case.