MLLGJun 21, 2023

Adversarial Training with Generated Data in High-Dimensional Regression: An Asymptotic Study

arXiv:2306.12582v11 citationsh-index: 8
Originality Incremental advance
AI Analysis

This work addresses adversarial robustness in high-dimensional regression, offering theoretical insights and practical tools, but it is incremental as it builds on existing two-stage training approaches.

The paper tackles the problem of adversarial training in high-dimensional linear regression by analyzing the asymptotic behavior of a two-stage method that uses generated data with pseudo-labels, finding that with proper regularization, it outperforms ridgeless training and avoids double-descent. It also provides a shortcut cross-validation formula for this method.

In recent years, studies such as \cite{carmon2019unlabeled,gowal2021improving,xing2022artificial} have demonstrated that incorporating additional real or generated data with pseudo-labels can enhance adversarial training through a two-stage training approach. In this paper, we perform a theoretical analysis of the asymptotic behavior of this method in high-dimensional linear regression. While a double-descent phenomenon can be observed in ridgeless training, with an appropriate $\mathcal{L}_2$ regularization, the two-stage adversarial training achieves a better performance. Finally, we derive a shortcut cross-validation formula specifically tailored for the two-stage training method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes