Medical Multimodal Classifiers Under Scarce Data Condition
This work addresses the challenge of small datasets in medical institutes for assisting radiologists in anomaly detection, though it is incremental as it builds on existing transfer learning and integrated gradients methods.
The authors tackled the problem of classifying patients as abnormal or normal using a deep multimodal model under scarce data conditions, achieving an average accuracy improvement of 4% and 7% compared to individual text and image models over 50 epochs.
Data is one of the essential ingredients to power deep learning research. Small datasets, especially specific to medical institutes, bring challenges to deep learning training stage. This work aims to develop a practical deep multimodal that can classify patients into abnormal and normal categories accurately as well as assist radiologists to detect visual and textual anomalies by locating areas of interest. The detection of the anomalies is achieved through a novel technique which extends the integrated gradients methodology with an unsupervised clustering algorithm. This technique also introduces a tuning parameter which trades off true positive signals to denoise false positive signals in the detection process. To overcome the challenges of the small training dataset which only has 3K frontal X-ray images and medical reports in pairs, we have adopted transfer learning for the multimodal which concatenates the layers of image and text submodels. The image submodel was trained on the vast ChestX-ray14 dataset, while the text submodel transferred a pertained word embedding layer from a hospital-specific corpus. Experimental results show that our multimodal improves the accuracy of the classification by 4% and 7% on average of 50 epochs, compared to the individual text and image model, respectively.