CVAILGMar 22, 2023

An Extended Study of Human-like Behavior under Adversarial Training

arXiv:2303.12669v112 citationsh-index: 18
Originality Incremental advance
AI Analysis

This work addresses the problem of improving neural network robustness and human-like reasoning for AI safety and generalization, but it is incremental as it builds on prior observations.

The study investigates how adversarial training affects neural networks' shift towards shape-based reasoning, similar to humans, by analyzing its impact across different architectures and training methods, and proposes a frequency-based explanation for the phenomenon.

Neural networks have a number of shortcomings. Amongst the severest ones is the sensitivity to distribution shifts which allows models to be easily fooled into wrong predictions by small perturbations to inputs that are often imperceivable to humans and do not have to carry semantic meaning. Adversarial training poses a partial solution to address this issue by training models on worst-case perturbations. Yet, recent work has also pointed out that the reasoning in neural networks is different from humans. Humans identify objects by shape, while neural nets mainly employ texture cues. Exemplarily, a model trained on photographs will likely fail to generalize to datasets containing sketches. Interestingly, it was also shown that adversarial training seems to favorably increase the shift toward shape bias. In this work, we revisit this observation and provide an extensive analysis of this effect on various architectures, the common $\ell_2$- and $\ell_\infty$-training, and Transformer-based models. Further, we provide a possible explanation for this phenomenon from a frequency perspective.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes