RadFormer: Transformers with Global-Local Attention for Interpretable and Accurate Gallbladder Cancer Detection
This work addresses the problem of accurate and interpretable diagnosis of gallbladder cancer for medical practitioners, offering a potential second-reader tool.
The paper tackled gallbladder cancer detection from ultrasound images by proposing a transformer-based architecture with global-local attention, achieving detection accuracy that surpasses human radiologists.
We propose a novel deep neural network architecture to learn interpretable representation for medical image analysis. Our architecture generates a global attention for region of interest, and then learns bag of words style deep feature embeddings with local attention. The global, and local feature maps are combined using a contemporary transformer architecture for highly accurate Gallbladder Cancer (GBC) detection from Ultrasound (USG) images. Our experiments indicate that the detection accuracy of our model beats even human radiologists, and advocates its use as the second reader for GBC diagnosis. Bag of words embeddings allow our model to be probed for generating interpretable explanations for GBC detection consistent with the ones reported in medical literature. We show that the proposed model not only helps understand decisions of neural network models but also aids in discovery of new visual features relevant to the diagnosis of GBC. Source-code and model will be available at https://github.com/sbasu276/RadFormer