Adversarially-Aware Architecture Design for Robust Medical AI Systems
This addresses vulnerabilities in medical AI systems that threaten patient safety, particularly for underserved populations, but is incremental in applying known defense methods to a specific domain.
The study investigated adversarial attacks on AI systems in healthcare using a dermatological dataset, finding that these attacks significantly reduce classification accuracy, and while defenses like adversarial training reduce attack success rates, they must be balanced against performance on clean data.
Adversarial attacks pose a severe risk to AI systems used in healthcare, capable of misleading models into dangerous misclassifications that can delay treatments or cause misdiagnoses. These attacks, often imperceptible to human perception, threaten patient safety, particularly in underserved populations. Our study explores these vulnerabilities through empirical experimentation on a dermatological dataset, where adversarial methods significantly reduce classification accuracy. Through detailed threat modeling, experimental benchmarking, and model evaluation, we demonstrate both the severity of the threat and the partial success of defenses like adversarial training and distillation. Our results show that while defenses reduce attack success rates, they must be balanced against model performance on clean data. We conclude with a call for integrated technical, ethical, and policy-based approaches to build more resilient, equitable AI in healthcare.