IV AI CV LGSep 16, 2025

MEGAN: Mixture of Experts for Robust Uncertainty Estimation in Endoscopy Videos

Damola Agbelese, Krishna Chaitanya, Pushpak Pati, Chaitanya Parmar, Pooya Mobadersany, Shreyas Fadnavis, Lindsey Surace, Shadi Yarandi, Louis R. Ghanem, Molly Lucas, Tommaso Mansi, Oana Gabriela Cula

arXiv:2509.12772v18.61 citationsh-index: 16UNSURE@MICCAI

Originality Incremental advance

AI Analysis

This addresses the issue of inter-rater variability in healthcare AI for ulcerative colitis trials, offering a method to enhance prediction confidence and calibration, though it is incremental as it builds on existing evidential deep learning methods.

The paper tackled the problem of unreliable uncertainty quantification in medical AI by proposing MEGAN, a mixture of experts model that aggregates predictions and uncertainties from multiple AI experts trained with diverse ground truths, resulting in a 3.5% improvement in F1-score and a 30.5% reduction in Expected Calibration Error on endoscopy videos for ulcerative colitis severity estimation.

Reliable uncertainty quantification (UQ) is essential in medical AI. Evidential Deep Learning (EDL) offers a computationally efficient way to quantify model uncertainty alongside predictions, unlike traditional methods such as Monte Carlo (MC) Dropout and Deep Ensembles (DE). However, all these methods often rely on a single expert's annotations as ground truth for model training, overlooking the inter-rater variability in healthcare. To address this issue, we propose MEGAN, a Multi-Expert Gating Network that aggregates uncertainty estimates and predictions from multiple AI experts via EDL models trained with diverse ground truths and modeling strategies. MEGAN's gating network optimally combines predictions and uncertainties from each EDL model, enhancing overall prediction confidence and calibration. We extensively benchmark MEGAN on endoscopy videos for Ulcerative colitis (UC) disease severity estimation, assessed by visual labeling of Mayo Endoscopic Subscore (MES), where inter-rater variability is prevalent. In large-scale prospective UC clinical trial, MEGAN achieved a 3.5% improvement in F1-score and a 30.5% reduction in Expected Calibration Error (ECE) compared to existing methods. Furthermore, MEGAN facilitated uncertainty-guided sample stratification, reducing the annotation burden and potentially increasing efficiency and consistency in UC trials.

View on arXiv PDF

Similar