Theoretically Principled Trade-off between Robustness and Accuracy
This work addresses the fundamental problem of balancing robustness and accuracy in machine learning models for security applications, offering a novel theoretical framework and practical defense.
The authors tackled the trade-off between robustness and accuracy in adversarial defenses by providing a theoretical decomposition of robust error and proposing a new method, TRADES, which won first place in the NeurIPS 2018 Adversarial Vision Challenge, surpassing the runner-up by 11.41% in mean perturbation distance.
We identify a trade-off between robustness and accuracy that serves as a guiding principle in the design of defenses against adversarial examples. Although this problem has been widely studied empirically, much remains unknown concerning the theory underlying this trade-off. In this work, we decompose the prediction error for adversarial examples (robust error) as the sum of the natural (classification) error and boundary error, and provide a differentiable upper bound using the theory of classification-calibrated loss, which is shown to be the tightest possible upper bound uniform over all probability distributions and measurable predictors. Inspired by our theoretical analysis, we also design a new defense method, TRADES, to trade adversarial robustness off against accuracy. Our proposed algorithm performs well experimentally in real-world datasets. The methodology is the foundation of our entry to the NeurIPS 2018 Adversarial Vision Challenge in which we won the 1st place out of ~2,000 submissions, surpassing the runner-up approach by $11.41\%$ in terms of mean $\ell_2$ perturbation distance.