LGCVMLMay 20, 2025

Adversarial Training from Mean Field Perspective

arXiv:2505.14021v11 citationsh-index: 3
Originality Highly original
AI Analysis

This work provides foundational theoretical insights into adversarial training, addressing a key challenge in robust machine learning for researchers and practitioners.

The authors tackled the problem of understanding adversarial training dynamics in deep neural networks by introducing a mean field theory framework, deriving tight upper bounds for adversarial loss and proving that networks without shortcuts are not adversarially trainable while width alleviates these issues.

Although adversarial training is known to be effective against adversarial examples, training dynamics are not well understood. In this study, we present the first theoretical analysis of adversarial training in random deep neural networks without any assumptions on data distributions. We introduce a new theoretical framework based on mean field theory, which addresses the limitations of existing mean field-based approaches. Based on this framework, we derive (empirically tight) upper bounds of $\ell_q$ norm-based adversarial loss with $\ell_p$ norm-based adversarial examples for various values of $p$ and $q$. Moreover, we prove that networks without shortcuts are generally not adversarially trainable and that adversarial training reduces network capacity. We also show that network width alleviates these issues. Furthermore, we present the various impacts of the input and output dimensions on the upper bounds and time evolution of the weight variance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes