On the Adversarial Robustness of Neural Networks without Weight Transport
This addresses the vulnerability of neural networks to adversarial attacks, which is a critical security issue for AI systems, but it is incremental as it builds on existing feedback alignment methods.
The paper tackles the problem of adversarial robustness in neural networks by showing that networks trained without weight transport using feedback alignment are harder to fool than those trained with backpropagation, achieving 98% adversarial accuracy on MNIST compared to 0.03% for backpropagation, though the gap decreases on CIFAR-10.
Neural networks trained with backpropagation, the standard algorithm of deep learning which uses weight transport, are easily fooled by existing gradient-based adversarial attacks. This class of attacks are based on certain small perturbations of the inputs to make networks misclassify them. We show that less biologically implausible deep neural networks trained with feedback alignment, which do not use weight transport, can be harder to fool, providing actual robustness. Tested on MNIST, deep neural networks trained without weight transport (1) have an adversarial accuracy of 98% compared to 0.03% for neural networks trained with backpropagation and (2) generate non-transferable adversarial examples. However, this gap decreases on CIFAR-10 but is still significant particularly for small perturbation magnitude less than 1/2.