Transformer-Based Wireless Capsule Endoscopy Bleeding Tissue Detection and Classification
This work addresses a domain-specific medical imaging problem for gastrointestinal diagnostics, with incremental improvements using existing methods on new data.
The paper tackles the problem of automatically detecting and classifying bleeding tissues in Wireless Capsule Endoscopy videos by designing an end-to-end transformer-based model, achieving classification accuracy of 98.28% and detection mAP of 0.7328, earning third place in a challenge.
Informed by the success of the transformer model in various computer vision tasks, we design an end-to-end trainable model for the automatic detection and classification of bleeding and non-bleeding frames extracted from Wireless Capsule Endoscopy (WCE) videos. Based on the DETR model, our model uses the Resnet50 for feature extraction, the transformer encoder-decoder for bleeding and non-bleeding region detection, and a feedforward neural network for classification. Trained in an end-to-end approach on the Auto-WCEBleedGen Version 1 challenge training set, our model performs both detection and classification tasks as a single unit. Our model achieves an accuracy, recall, and F1-score classification percentage score of 98.28, 96.79, and 98.37 respectively, on the Auto-WCEBleedGen version 1 validation set. Further, we record an average precision (AP @ 0.5), mean-average precision (mAP) of 0.7447 and 0.7328 detection results. This earned us a 3rd place position in the challenge. Our code is publicly available via https://github.com/BasitAlawode/WCEBleedGen.