The Ability of Image-Language Explainable Models to Resemble Domain Expertise
This addresses the need for transparency and domain expertise in vision-language models for healthcare, though it appears incremental by applying existing explainability methods to a specific domain.
The paper tackled the problem of black-box deep learning models in healthcare by using local surrogate explainability techniques to generate multi-modal visual and language explanations, demonstrating that these explanations can guide model training for data scientists and engineers.
Recent advances in vision and language (V+L) models have a promising impact in the healthcare field. However, such models struggle to explain how and why a particular decision was made. In addition, model transparency and involvement of domain expertise are critical success factors for machine learning models to make an entrance into the field. In this work, we study the use of the local surrogate explainability technique to overcome the problem of black-box deep learning models. We explore the feasibility of resembling domain expertise using the local surrogates in combination with an underlying V+L to generate multi-modal visual and language explanations. We demonstrate that such explanations can serve as helpful feedback in guiding model training for data scientists and machine learning engineers in the field.