Exploration of Interpretability Techniques for Deep COVID-19 Classification using Chest X-ray Images
This work addresses the need for early and accurate diagnosis of COVID-19 to limit its spread, but it is incremental as it applies existing models and interpretability methods to a specific medical imaging task.
The study tackled the problem of classifying COVID-19, pneumonia, and healthy subjects from chest X-ray images using deep learning models and their ensemble, achieving a mean Micro-F1 score of 0.89 for COVID-19 classification with the ensemble. It explored interpretability techniques to compare model performance and found ResNets to be the most interpretable.
The outbreak of COVID-19 has shocked the entire world with its fairly rapid spread and has challenged different sectors. One of the most effective ways to limit its spread is the early and accurate diagnosing infected patients. Medical imaging, such as X-ray and Computed Tomography (CT), combined with the potential of Artificial Intelligence (AI), plays an essential role in supporting medical personnel in the diagnosis process. Thus, in this article five different deep learning models (ResNet18, ResNet34, InceptionV3, InceptionResNetV2 and DenseNet161) and their ensemble, using majority voting have been used to classify COVID-19, pneumoniæ and healthy subjects using chest X-ray images. Multilabel classification was performed to predict multiple pathologies for each patient, if present. Firstly, the interpretability of each of the networks was thoroughly studied using local interpretability methods - occlusion, saliency, input X gradient, guided backpropagation, integrated gradients, and DeepLIFT, and using a global technique - neuron activation profiles. The mean Micro-F1 score of the models for COVID-19 classifications ranges from 0.66 to 0.875, and is 0.89 for the ensemble of the network models. The qualitative results showed that the ResNets were the most interpretable models. This research demonstrates the importance of using interpretability methods to compare different models before making a decision regarding the best performing model.