CVLGJun 16, 2020

The shape and simplicity biases of adversarially robust ImageNet-trained CNNs

arXiv:2006.09373v621 citations
Originality Incremental advance
AI Analysis

This addresses the problem of understanding why adversarially robust CNNs generalize better, which is important for researchers in machine learning and computer vision, though it is incremental as it builds on existing adversarial training methods.

The paper investigates how adversarial training affects the internal biases of ImageNet-trained CNNs, finding that it shifts them from texture bias to shape bias and introduces three simplicity biases in hidden neurons, such as smoother patterns and lower-level features, which enhance robustness and explain prior findings like improved capacity and image synthesis performance.

Increasingly more similarities between human vision and convolutional neural networks (CNNs) have been revealed in the past few years. Yet, vanilla CNNs often fall short in generalizing to adversarial or out-of-distribution (OOD) examples which humans demonstrate superior performance. Adversarial training is a leading learning algorithm for improving the robustness of CNNs on adversarial and OOD data; however, little is known about the properties, specifically the shape bias and internal features learned inside adversarially-robust CNNs. In this paper, we perform a thorough, systematic study to understand the shape bias and some internal mechanisms that enable the generalizability of AlexNet, GoogLeNet, and ResNet-50 models trained via adversarial training. We find that while standard ImageNet classifiers have a strong texture bias, their R counterparts rely heavily on shapes. Remarkably, adversarial training induces three simplicity biases into hidden neurons in the process of "robustifying" CNNs. That is, each convolutional neuron in R networks often changes to detecting (1) pixel-wise smoother patterns, i.e., a mechanism that blocks high-frequency noise from passing through the network; (2) more lower-level features i.e. textures and colors (instead of objects);and (3) fewer types of inputs. Our findings reveal the interesting mechanisms that made networks more adversarially robust and also explain some recent findings e.g., why R networks benefit from a much larger capacity (Xie et al. 2020) and can act as a strong image prior in image synthesis (Santurkar et al. 2019).

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes