CVOct 24, 2024

CapsoNet: A CNN-Transformer Ensemble for Multi-Class Abnormality Detection in Video Capsule Endoscopy

arXiv:2410.18879v32 citationsh-index: 2Has Code

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of automated abnormality classification in medical imaging for gastroenterology, representing an incremental improvement in a domain-specific challenge.

The paper tackled multi-class abnormality detection in video capsule endoscopy frames by developing CapsoNet, an ensemble of CNNs and transformers, achieving a balanced accuracy of 86.34% and mean AUC-ROC of 0.9908 on a validation set.

We present CapsoNet, a deep learning framework developed for the Capsule Vision 2024 Challenge, designed to perform multi-class abnormality classification in video capsule endoscopy (VCE) frames. CapsoNet leverages an ensemble of convolutional neural networks (CNNs) and transformer-based architectures to capture both local and global visual features. The model was trained and evaluated on a dataset of over 50,000 annotated frames spanning ten abnormality classes, sourced from three public and one private dataset. To address the challenge of class imbalance, we employed focal loss, weighted random sampling, and extensive data augmentation strategies. All models were fully fine-tuned to maximize performance within the ensemble. CapsoNet achieved a balanced accuracy of 86.34 percent and a mean AUC-ROC of 0.9908 on the official validation set, securing Team Seq2Cure 5th place in the competition. Our implementation is available at http://github.com/arnavs04/capsule-vision-2024

View on arXiv PDF Code

Similar