IV CVAug 14, 2023

Robustness Stress Testing in Medical Image Classification

arXiv:2308.06889v212.810 citationsh-index: 61Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses the need for better validation of disease detection models in medical imaging to ensure generalizability and fairness, though it is incremental as it applies existing stress testing concepts to this domain.

The paper tackled the problem of assessing the robustness and subgroup performance disparities of deep neural networks in medical image classification by employing progressive stress testing with image perturbations, finding that some models yield more robust and equitable performance and that pretraining characteristics influence downstream robustness.

Deep neural networks have shown impressive performance for image-based disease detection. Performance is commonly evaluated through clinical validation on independent test sets to demonstrate clinically acceptable accuracy. Reporting good performance metrics on test sets, however, is not always a sufficient indication of the generalizability and robustness of an algorithm. In particular, when the test data is drawn from the same distribution as the training data, the iid test set performance can be an unreliable estimate of the accuracy on new data. In this paper, we employ stress testing to assess model robustness and subgroup performance disparities in disease detection models. We design progressive stress testing using five different bidirectional and unidirectional image perturbations with six different severity levels. As a use case, we apply stress tests to measure the robustness of disease detection models for chest X-ray and skin lesion images, and demonstrate the importance of studying class and domain-specific model behaviour. Our experiments indicate that some models may yield more robust and equitable performance than others. We also find that pretraining characteristics play an important role in downstream robustness. We conclude that progressive stress testing is a viable and important tool and should become standard practice in the clinical validation of image-based disease detection models.

View on arXiv PDF Code

Similar