8.0CVMar 16
RARE disease detection from Capsule Endoscopic Videos based on Vision TransformersX. Gao, C. Chien, G. Liu et al.
This work is corresponding to the Gastro Competition for multi-label classification from capsule endoscopic videos (CEV). Deep learning network based on Transformers are fined-tune for this task. The based online mode is Google Vision Transformer (ViT) batch16 with 224 x 224 resolutions. In total, 17 labels are classified, which are mouth, esophagus, stomach, small intestine, colon, z-line, pylorus, ileocecal valve, active bleeding, angiectasia, blood, erosion, erythema, hematin, lymphangioectasis, polyp, and ulcer. For test dataset of three videos, the overall mAP @0.5 is 0.0205 whereas the overall mAP @0.95 is 0.0196.