LG AIApr 16, 2025

RDI: An adversarial robustness evaluation metric for deep neural networks based on model statistical features

Jialei Song, Xingquan Zuo, Feiyang Wang, Hai Huang, Tianle Zhang

arXiv:2504.18556v29.42 citationsh-index: 3Has CodeUAI

Originality Incremental advance

AI Analysis

This provides a more efficient and attack-independent evaluation method for researchers and practitioners concerned with adversarial robustness in safety-critical applications, though it is incremental as it builds on existing robustness evaluation approaches.

The paper tackles the problem of evaluating adversarial robustness in deep neural networks by proposing the Robustness Difference Index (RDI), a metric based on model statistical features that shows a stronger correlation with attack success rates and reduces computation time to 1/30 compared to PGD-based methods.

Deep neural networks (DNNs) are highly susceptible to adversarial samples, raising concerns about their reliability in safety-critical tasks. Currently, methods of evaluating adversarial robustness are primarily categorized into attack-based and certified robustness evaluation approaches. The former not only relies on specific attack algorithms but also is highly time-consuming, while the latter due to its analytical nature, is typically difficult to implement for large and complex models. A few studies evaluate model robustness based on the model's decision boundary, but they suffer from low evaluation accuracy. To address the aforementioned issues, we propose a novel adversarial robustness evaluation metric, Robustness Difference Index (RDI), which is based on model statistical features. RDI draws inspiration from clustering evaluation by analyzing the intra-class and inter-class distances of feature vectors separated by the decision boundary to quantify model robustness. It is attack-independent and has high computational efficiency. Experiments show that, RDI demonstrates a stronger correlation with the gold-standard adversarial robustness metric of attack success rate (ASR). The average computation time of RDI is only 1/30 of the evaluation method based on the PGD attack. Our open-source code is available at: https://github.com/BUPTAIOC/RDI.

View on arXiv PDF Code

Similar