LG MLJul 30, 2023

On Neural Network approximation of ideal adversarial attack and convergence of adversarial training

arXiv:2307.16099v12.0h-index: 17

Originality Incremental advance

AI Analysis

This work addresses the computational bottleneck in adversarial machine learning for researchers and practitioners, though it appears incremental as it builds on existing ideas of representing attacks as functions.

The authors tackled the computational inefficiency of generating adversarial attacks by proposing to represent them as trainable neural networks, eliminating the need for gradient computations each time. They proved that ideal attacks can be approximated by neural networks and showed convergence rates for adversarial training in this framework.

Adversarial attacks are usually expressed in terms of a gradient-based operation on the input data and model, this results in heavy computations every time an attack is generated. In this work, we solidify the idea of representing adversarial attacks as a trainable function, without further gradient computation. We first motivate that the theoretical best attacks, under proper conditions, can be represented as smooth piece-wise functions (piece-wise Hölder functions). Then we obtain an approximation result of such functions by a neural network. Subsequently, we emulate the ideal attack process by a neural network and reduce the adversarial training to a mathematical game between an attack network and a training model (a defense network). We also obtain convergence rates of adversarial loss in terms of the sample size $n$ for adversarial training in such a setting.

View on arXiv PDF

Similar