CVMar 16

RARE disease detection from Capsule Endoscopic Videos based on Vision Transformers

arXiv:2603.180458.0h-index: 1
Predicted impact top 86% in CV · last 90 daysOriginality Synthesis-oriented
AI Analysis

This work addresses automated disease detection in gastrointestinal videos for medical diagnostics, but it is incremental as it applies an existing method to a new dataset.

The paper tackled multi-label classification of 17 anatomical and pathological labels in capsule endoscopic videos using a fine-tuned Vision Transformer, achieving an overall mAP @0.5 of 0.0205 and mAP @0.95 of 0.0196 on a test dataset of three videos.

This work is corresponding to the Gastro Competition for multi-label classification from capsule endoscopic videos (CEV). Deep learning network based on Transformers are fined-tune for this task. The based online mode is Google Vision Transformer (ViT) batch16 with 224 x 224 resolutions. In total, 17 labels are classified, which are mouth, esophagus, stomach, small intestine, colon, z-line, pylorus, ileocecal valve, active bleeding, angiectasia, blood, erosion, erythema, hematin, lymphangioectasis, polyp, and ulcer. For test dataset of three videos, the overall mAP @0.5 is 0.0205 whereas the overall mAP @0.95 is 0.0196.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes