Exploring visual language models as a powerful tool in the diagnosis of Ewing Sarcoma
This work addresses the challenge of accurate diagnosis for Ewing Sarcoma, a health concern in adolescents, by applying AI to histopathology, though it is incremental as it builds on existing methods in a new domain.
This study tackled the problem of diagnosing Ewing Sarcoma from histopathological images by exploring visual language models, finding that vision-language supervision improved diagnostic accuracy and reduced computational costs compared to ImageNet pre-training.
Ewing's sarcoma (ES), characterized by a high density of small round blue cells without structural organization, presents a significant health concern, particularly among adolescents aged 10 to 19. Artificial intelligence-based systems for automated analysis of histopathological images are promising to contribute to an accurate diagnosis of ES. In this context, this study explores the feature extraction ability of different pre-training strategies for distinguishing ES from other soft tissue or bone sarcomas with similar morphology in digitized tissue microarrays for the first time, as far as we know. Vision-language supervision (VLS) is compared to fully-supervised ImageNet pre-training within a multiple instance learning paradigm. Our findings indicate a substantial improvement in diagnostic accuracy with the adaption of VLS using an in-domain dataset. Notably, these models not only enhance the accuracy of predicted classes but also drastically reduce the number of trainable parameters and computational costs.