Narrowing Class-Wise Robustness Gaps in Adversarial Training
This addresses class-wise robustness gaps in adversarial training for machine learning models, representing an incremental improvement.
The paper tackles the problem of adversarial training exacerbating performance imbalances across different classes while improving robustness. It shows that enhanced labeling during training boosts adversarial robustness by 53.50% and reduces class imbalances by 5.73%, improving accuracy in both clean and adversarial settings.
Efforts to address declining accuracy as a result of data shifts often involve various data-augmentation strategies. Adversarial training is one such method, designed to improve robustness to worst-case distribution shifts caused by adversarial examples. While this method can improve robustness, it may also hinder generalization to clean examples and exacerbate performance imbalances across different classes. This paper explores the impact of adversarial training on both overall and class-specific performance, as well as its spill-over effects. We observe that enhanced labeling during training boosts adversarial robustness by 53.50% and mitigates class imbalances by 5.73%, leading to improved accuracy in both clean and adversarial settings compared to standard adversarial training.