LG AI CVJul 10, 2021

Identifying Layers Susceptible to Adversarial Attacks

arXiv:2107.04827v23.13 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of improving adversarial robustness in deep learning models for practitioners, but it is incremental as it supports existing hypotheses without introducing new methods.

The paper tackled the problem of understanding which layers in neural networks are susceptible to adversarial attacks by selectively retraining parts of VGG and ResNet on datasets like CIFAR-10 and ImageNet. The result showed that low-level feature extraction layers are most vulnerable, and retraining high-level layers alone is insufficient for robustness, with adversarial samples causing statistically different outputs in early layers.

In this paper, we investigate the use of pretraining with adversarial networks, with the objective of discovering the relationship between network depth and robustness. For this purpose, we selectively retrain different portions of VGG and ResNet architectures on CIFAR-10, Imagenette, and ImageNet using non-adversarial and adversarial data. Experimental results show that susceptibility to adversarial samples is associated with low-level feature extraction layers. Therefore, retraining of high-level layers is insufficient for achieving robustness. Furthermore, adversarial attacks yield outputs from early layers that differ statistically from features for non-adversarial samples and do not permit consistent classification by subsequent layers. This supports common hypotheses regarding the association of robustness with the feature extractor, insufficiency of deeper layers in providing robustness, and large differences in adversarial and non-adversarial feature vectors.

View on arXiv PDF

Similar