Endoscopy Classification Model Using Swin Transformer and Saliency Map
This work addresses the time-consuming and expertise-dependent process of endoscopy for colon cancer diagnosis, representing an incremental improvement in medical image analysis.
The authors tackled endoscopic image classification for colon cancer diagnosis by proposing a multi-label method combining Swin transformer and modified VGG16 with saliency maps, achieving superior performance over state-of-the-art methods in quantitative evaluations.
Endoscopy is a valuable tool for the early diagnosis of colon cancer. However, it requires the expertise of endoscopists and is a time-consuming process. In this work, we propose a new multi-label classification method, which considers two aspects of learning approaches (local and global views) for endoscopic image classification. The model consists of a Swin transformer branch and a modified VGG16 model as a CNN branch. To help the learning process of the CNN branch, the model employs saliency maps and endoscopy images and concatenates them. The results demonstrate that this method performed well for endoscopic medical images by utilizing local and global features of the images. Furthermore, quantitative evaluations prove the proposed method's superiority over state-of-the-art works.