Adversarial flows: A gradient flow characterization of adversarial attacks
This provides a theoretical foundation for adversarial attacks in machine learning, but it is incremental as it builds on existing gradient flow concepts.
The paper interprets adversarial attack methods like the fast gradient sign method as explicit Euler discretizations of a differential inclusion, proving convergence to gradient flows and characterizing them via ∞-curves of maximum slope and Wasserstein gradient flows. It shows that normalized gradient descent methods converge to these flows and links adversarial training to optimal transport spaces.
A popular method to perform adversarial attacks on neuronal networks is the so-called fast gradient sign method and its iterative variant. In this paper, we interpret this method as an explicit Euler discretization of a differential inclusion, where we also show convergence of the discretization to the associated gradient flow. To do so, we consider the concept of p-curves of maximal slope in the case $p=\infty$. We prove existence of $\infty$-curves of maximum slope and derive an alternative characterization via differential inclusions. Furthermore, we also consider Wasserstein gradient flows for potential energies, where we show that curves in the Wasserstein space can be characterized by a representing measure on the space of curves in the underlying Banach space, which fulfill the differential inclusion. The application of our theory to the finite-dimensional setting is twofold: On the one hand, we show that a whole class of normalized gradient descent methods (in particular signed gradient descent) converge, up to subsequences, to the flow, when sending the step size to zero. On the other hand, in the distributional setting, we show that the inner optimization task of adversarial training objective can be characterized via $\infty$-curves of maximum slope on an appropriate optimal transport space.