CVLGMar 18

ResNet-50 with Class Reweighting and Anatomy-Guided Temporal Decoding for Gastrointestinal Video Analysis

arXiv:2603.177844.4h-index: 23
AI Analysis

This work addresses the problem of multi-label classification in gastrointestinal videos for medical diagnosis, representing an incremental improvement over existing methods.

The paper tackled gastrointestinal video analysis by developing a pipeline using ResNet-50 with class reweighting and anatomy-guided temporal decoding to predict 17 labels, improving temporal mAP from 0.3801 to 0.4303 on a test set.

We developed a multi-label gastrointestinal video analysis pipeline based on a ResNet-50 frame classifier followed by anatomy-guided temporal event decoding. The system predicts 17 labels, including 5 anatomy classes and 12 pathology classes, from frames resized to 336x336. A major challenge was severe class imbalance, particularly for rare pathology labels. To address this, we used clipped class-wise positive weighting in the training loss, which improved rare-class learning while maintaining stable optimization. At the temporal stage, we found that direct frame-to-event conversion produced fragmented mismatches with the official ground truth. The final submission therefore combined GT-style framewise event composition, anatomy vote smoothing, and anatomy-based pathology gating with a conservative hysteresis decoder. This design improved the final temporal mAP from 0.3801 to 0.4303 on the challenge test set.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes