IVCVDec 24, 2025

A Graph-Augmented knowledge Distillation based Dual-Stream Vision Transformer with Region-Aware Attention for Gastrointestinal Disease Classification with Explainable AI

arXiv:2512.21372v15.1h-index: 3
Originality Incremental advance
AI Analysis

This provides an incremental improvement for medical diagnostics in resource-constrained clinical environments by balancing efficiency and accuracy.

This study tackled the challenge of accurately classifying gastrointestinal diseases from endoscopic imagery by developing a hybrid dual-stream deep learning framework using teacher-student knowledge distillation, achieving near-perfect accuracies of 0.9978 and 0.9928 on two datasets with an average AUC of 1.0000.

The accurate classification of gastrointestinal diseases from endoscopic and histopathological imagery remains a significant challenge in medical diagnostics, mainly due to the vast data volume and subtle variation in inter-class visuals. This study presents a hybrid dual-stream deep learning framework built on teacher-student knowledge distillation, where a high-capacity teacher model integrates the global contextual reasoning of a Swin Transformer with the local fine-grained feature extraction of a Vision Transformer. The student network was implemented as a compact Tiny-ViT structure that inherits the teacher's semantic and morphological knowledge via soft-label distillation, achieving a balance between efficiency and diagnostic accuracy. Two carefully curated Wireless Capsule Endoscopy datasets, encompassing major GI disease classes, were employed to ensure balanced representation and prevent inter-sample bias. The proposed framework achieved remarkable performance with accuracies of 0.9978 and 0.9928 on Dataset 1 and Dataset 2 respectively, and an average AUC of 1.0000, signifying near-perfect discriminative capability. Interpretability analyses using Grad-CAM, LIME, and Score-CAM confirmed that the model's predictions were grounded in clinically significant tissue regions and pathologically relevant morphological cues, validating the framework's transparency and reliability. The Tiny-ViT demonstrated diagnostic performance with reduced computational complexity comparable to its transformer-based teacher while delivering faster inference, making it suitable for resource-constrained clinical environments. Overall, the proposed framework provides a robust, interpretable, and scalable solution for AI-assisted GI disease diagnosis, paving the way toward future intelligent endoscopic screening that is compatible with clinical practicality.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes