RARE disease detection from Capsule Endoscopic Videos based on Vision Transformers
This work addresses automated disease detection in gastrointestinal videos for medical diagnostics, but it is incremental as it applies an existing method to a new dataset.
The paper tackled multi-label classification of 17 anatomical and pathological labels in capsule endoscopic videos using a fine-tuned Vision Transformer, achieving an overall mAP @0.5 of 0.0205 and mAP @0.95 of 0.0196 on a test dataset of three videos.
This work is corresponding to the Gastro Competition for multi-label classification from capsule endoscopic videos (CEV). Deep learning network based on Transformers are fined-tune for this task. The based online mode is Google Vision Transformer (ViT) batch16 with 224 x 224 resolutions. In total, 17 labels are classified, which are mouth, esophagus, stomach, small intestine, colon, z-line, pylorus, ileocecal valve, active bleeding, angiectasia, blood, erosion, erythema, hematin, lymphangioectasis, polyp, and ulcer. For test dataset of three videos, the overall mAP @0.5 is 0.0205 whereas the overall mAP @0.95 is 0.0196.