LGJun 19, 2024

Large-Scale Dataset Pruning in Adversarial Training through Data Importance Extrapolation

arXiv:2406.13283v23 citations
Originality Incremental advance
AI Analysis

This addresses the problem of training time in adversarial training for deep learning practitioners, but it is incremental as it applies existing pruning concepts to a new context.

The paper tackles the high computational cost of adversarial training by proposing a data pruning strategy that extrapolates importance scores from a small dataset to a larger one, demonstrating efficient dataset size reduction while maintaining robustness.

Their vulnerability to small, imperceptible attacks limits the adoption of deep learning models to real-world systems. Adversarial training has proven to be one of the most promising strategies against these attacks, at the expense of a substantial increase in training time. With the ongoing trend of integrating large-scale synthetic data this is only expected to increase even further. Thus, the need for data-centric approaches that reduce the number of training samples while maintaining accuracy and robustness arises. While data pruning and active learning are prominent research topics in deep learning, they are as of now largely unexplored in the adversarial training literature. We address this gap and propose a new data pruning strategy based on extrapolating data importance scores from a small set of data to a larger set. In an empirical evaluation, we demonstrate that extrapolation-based pruning can efficiently reduce dataset size while maintaining robustness.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes