MLCVLGNEMay 30, 2018

Robustness May Be at Odds with Accuracy

arXiv:1805.12152v51952 citations
Originality Incremental advance
AI Analysis

This work highlights a fundamental tension in machine learning that affects model design and evaluation, with implications for security and generalization in AI systems.

The paper demonstrates a provable trade-off between adversarial robustness and standard accuracy in a simple setting, showing that robust models may reduce standard accuracy and learn different feature representations that align better with human perception.

We show that there may exist an inherent tension between the goal of adversarial robustness and that of standard generalization. Specifically, training robust models may not only be more resource-consuming, but also lead to a reduction of standard accuracy. We demonstrate that this trade-off between the standard accuracy of a model and its robustness to adversarial perturbations provably exists in a fairly simple and natural setting. These findings also corroborate a similar phenomenon observed empirically in more complex settings. Further, we argue that this phenomenon is a consequence of robust classifiers learning fundamentally different feature representations than standard classifiers. These differences, in particular, seem to result in unexpected benefits: the representations learned by robust models tend to align better with salient data characteristics and human perception.

Code Implementations8 repos

Data from Papers with Code (CC-BY-SA-4.0)

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes