ZeroSlide: Is Zero-Shot Classification Adequate for Lifelong Learning in Whole-Slide Image Analysis in the Era of Pathology Vision-Language Foundation Models?
This addresses a practical problem for clinics and hospitals in efficiently handling multiple pathology tasks without retraining models, but it is incremental as it builds on existing foundation models.
The study tackled the problem of lifelong learning for whole-slide images by comparing conventional continual-learning methods with vision-language zero-shot classification, finding that further investigation into continual learning strategies is needed to improve performance.
Lifelong learning for whole slide images (WSIs) poses the challenge of training a unified model to perform multiple WSI-related tasks, such as cancer subtyping and tumor classification, in a distributed, continual fashion. This is a practical and applicable problem in clinics and hospitals, as WSIs are large, require storage, processing, and transfer time. Training new models whenever new tasks are defined is time-consuming. Recent work has applied regularization- and rehearsal-based methods to this setting. However, the rise of vision-language foundation models that align diagnostic text with pathology images raises the question: are these models alone sufficient for lifelong WSI learning using zero-shot classification, or is further investigation into continual learning strategies needed to improve performance? To our knowledge, this is the first study to compare conventional continual-learning approaches with vision-language zero-shot classification for WSIs. Our source code and experimental results will be available soon.