IVCVLGDec 15, 2022

On Evaluating Adversarial Robustness of Chest X-ray Classification: Pitfalls and Best Practices

arXiv:2212.08130v15 citationsh-index: 66
Originality Incremental advance
AI Analysis

This work addresses the challenge of robust evaluation in medical AI, highlighting pitfalls for researchers and practitioners, but it is incremental as it builds on existing adversarial robustness research.

The paper tackles the problem of evaluating adversarial robustness in chest X-ray classification, revealing that assessments vary significantly based on dataset, architecture, and metrics, with the largest evaluation conducted on 3 datasets, 7 models, and 18 diseases.

Vulnerability to adversarial attacks is a well-known weakness of Deep Neural Networks. While most of the studies focus on natural images with standardized benchmarks like ImageNet and CIFAR, little research has considered real world applications, in particular in the medical domain. Our research shows that, contrary to previous claims, robustness of chest x-ray classification is much harder to evaluate and leads to very different assessments based on the dataset, the architecture and robustness metric. We argue that previous studies did not take into account the peculiarity of medical diagnosis, like the co-occurrence of diseases, the disagreement of labellers (domain experts), the threat model of the attacks and the risk implications for each successful attack. In this paper, we discuss the methodological foundations, review the pitfalls and best practices, and suggest new methodological considerations for evaluating the robustness of chest xray classification models. Our evaluation on 3 datasets, 7 models, and 18 diseases is the largest evaluation of robustness of chest x-ray classification models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes