Assessing Robustness of Deep learning Methods in Dermatological Workflow
This addresses the problem of unreliable AI performance in real-world clinical workflows for dermatologists, highlighting an incremental but critical gap in existing research.
The paper evaluated the robustness of deep learning methods in dermatology by simulating non-ideal clinical conditions on user-submitted images of ten disease classes, finding that overall accuracy dropped and individual predictions changed significantly despite robust training.
This paper aims to evaluate the suitability of current deep learning methods for clinical workflow especially by focusing on dermatology. Although deep learning methods have been attempted to get dermatologist level accuracy in several individual conditions, it has not been rigorously tested for common clinical complaints. Most projects involve data acquired in well-controlled laboratory conditions. This may not reflect regular clinical evaluation where corresponding image quality is not always ideal. We test the robustness of deep learning methods by simulating non-ideal characteristics on user submitted images of ten classes of diseases. Assessing via imitated conditions, we have found the overall accuracy to drop and individual predictions change significantly in many cases despite of robust training.