LG CR MLMar 6, 2024

Effect of Ambient-Intrinsic Dimension Gap on Adversarial Vulnerability

arXiv:2403.03967v210.46 citationsh-index: 8AISTATS

Originality Highly original

AI Analysis

This work addresses a foundational problem in machine learning security by theoretically explaining adversarial vulnerability, which is incremental as it builds on existing theories but offers new insights into dimension gaps.

The paper tackles the theoretical mystery of adversarial attacks by distinguishing between natural (on-manifold) and unnatural (off-manifold) attacks, arguing that off-manifold attacks arise from the dimension gap between intrinsic and ambient data dimensions. For 2-layer ReLU networks, it proves that this gap increases vulnerability to off-manifold adversarial perturbations, providing explicit relationships between attack strength and the dimension gap.

The existence of adversarial attacks on machine learning models imperceptible to a human is still quite a mystery from a theoretical perspective. In this work, we introduce two notions of adversarial attacks: natural or on-manifold attacks, which are perceptible by a human/oracle, and unnatural or off-manifold attacks, which are not. We argue that the existence of the off-manifold attacks is a natural consequence of the dimension gap between the intrinsic and ambient dimensions of the data. For 2-layer ReLU networks, we prove that even though the dimension gap does not affect generalization performance on samples drawn from the observed data space, it makes the clean-trained model more vulnerable to adversarial perturbations in the off-manifold direction of the data space. Our main results provide an explicit relationship between the $\ell_2,\ell_{\infty}$ attack strength of the on/off-manifold attack and the dimension gap.

View on arXiv PDF

Similar