CVOct 24, 2025

Knowledge-Driven Vision-Language Model for Plexus Detection in Hirschsprung's Disease

Youssef Megahed, Atallah Madi, Dina El Demellawy, Adrian D. C. Chan

arXiv:2510.21083v18.42 citationsh-index: 9

Originality Incremental advance

AI Analysis

This addresses the need for more interpretable and clinically relevant diagnostic tools in histopathology for physicians, though it is incremental as it builds on existing vision-language models.

The study tackled the problem of detecting myenteric plexus regions in Hirschsprung's disease by integrating expert-derived textual concepts into a vision-language model, achieving an accuracy of 83.9%, precision of 86.6%, and specificity of 87.6%, outperforming CNN-based models.

Hirschsprung's disease is defined as the congenital absence of ganglion cells in some segment(s) of the colon. The muscle cannot make coordinated movements to propel stool in that section, most commonly leading to obstruction. The diagnosis and treatment for this disease require a clear identification of different region(s) of the myenteric plexus, where ganglion cells should be present, on the microscopic view of the tissue slide. While deep learning approaches, such as Convolutional Neural Networks, have performed very well in this task, they are often treated as black boxes, with minimal understanding gained from them, and may not conform to how a physician makes decisions. In this study, we propose a novel framework that integrates expert-derived textual concepts into a Contrastive Language-Image Pre-training-based vision-language model to guide plexus classification. Using prompts derived from expert sources (e.g., medical textbooks and papers) generated by large language models and reviewed by our team before being encoded with QuiltNet, our approach aligns clinically relevant semantic cues with visual features. Experimental results show that the proposed model demonstrated superior discriminative capability across different classification metrics as it outperformed CNN-based models, including VGG-19, ResNet-18, and ResNet-50; achieving an accuracy of 83.9%, a precision of 86.6%, and a specificity of 87.6%. These findings highlight the potential of multi-modal learning in histopathology and underscore the value of incorporating expert knowledge for more clinically relevant model outputs.

View on arXiv PDF

Similar