EndoDINO: A Foundation Model for GI Endoscopy
This work addresses the need for robust and generalizable AI models in medical imaging for GI endoscopy, though it is incremental as it applies existing foundation model paradigms to a new domain.
The authors tackled the problem of generalizability in GI endoscopy tasks by developing EndoDINO, a foundation model pre-trained on a large curated dataset, which achieved state-of-the-art performance in anatomical landmark classification, polyp segmentation, and Mayo endoscopic scoring for ulcerative colitis.
In this work, we present EndoDINO, a foundation model for GI endoscopy tasks that achieves strong generalizability by pre-training on a well-curated image dataset sampled from the largest known GI endoscopy video dataset in the literature. Specifically, we pre-trained ViT models with 1B, 307M, and 86M parameters using datasets ranging from 100K to 10M curated images. Using EndoDINO as a frozen feature encoder, we achieved state-of-the-art performance in anatomical landmark classification, polyp segmentation, and Mayo endoscopic scoring (MES) for ulcerative colitis with only simple decoder heads.