IV CV LGDec 15, 2022

On Evaluating Adversarial Robustness of Chest X-ray Classification: Pitfalls and Best Practices

Salah Ghamizi, Maxime Cordy, Michail Papadakis, Yves Le Traon

arXiv:2212.08130v16.65 citationsh-index: 66

Originality Incremental advance

AI Analysis

This work addresses the challenge of robust evaluation in medical AI, highlighting pitfalls for researchers and practitioners, but it is incremental as it builds on existing adversarial robustness research.

The paper tackles the problem of evaluating adversarial robustness in chest X-ray classification, revealing that assessments vary significantly based on dataset, architecture, and metrics, with the largest evaluation conducted on 3 datasets, 7 models, and 18 diseases.

Vulnerability to adversarial attacks is a well-known weakness of Deep Neural Networks. While most of the studies focus on natural images with standardized benchmarks like ImageNet and CIFAR, little research has considered real world applications, in particular in the medical domain. Our research shows that, contrary to previous claims, robustness of chest x-ray classification is much harder to evaluate and leads to very different assessments based on the dataset, the architecture and robustness metric. We argue that previous studies did not take into account the peculiarity of medical diagnosis, like the co-occurrence of diseases, the disagreement of labellers (domain experts), the threat model of the attacks and the risk implications for each successful attack. In this paper, we discuss the methodological foundations, review the pitfalls and best practices, and suggest new methodological considerations for evaluating the robustness of chest xray classification models. Our evaluation on 3 datasets, 7 models, and 18 diseases is the largest evaluation of robustness of chest x-ray classification models.

View on arXiv PDF

Similar