VLCD: Vision-Language Contrastive Distillation for Accurate and Efficient Automatic Placenta Analysis
This work addresses efficiency and deployability issues in AI-based healthcare solutions for placenta analysis, particularly in resource-constrained environments, but it is incremental as it modifies existing vision-language contrastive learning frameworks.
The paper tackled the problem of computationally extensive automated placenta analysis by proposing vision-language contrastive distillation (VLCD), which achieved model compression and acceleration while matching or surpassing teacher model performance, with results showing improved robustness for lower-quality images.
Pathological examination of the placenta is an effective method for detecting and mitigating health risks associated with childbirth. Recent advancements in AI have enabled the use of photographs of the placenta and pathology reports for detecting and classifying signs of childbirth-related pathologies. However, existing automated methods are computationally extensive, which limits their deployability. We propose two modifications to vision-language contrastive learning (VLC) frameworks to enhance their accuracy and efficiency: (1) text-anchored vision-language contrastive knowledge distillation (VLCD)-a new knowledge distillation strategy for medical VLC pretraining, and (2) unsupervised predistillation using a large natural images dataset for improved initialization. Our approach distills efficient neural networks that match or surpass the teacher model in performance while achieving model compression and acceleration. Our results showcase the value of unsupervised predistillation in improving the performance and robustness of our approach, specifically for lower-quality images. VLCD serves as an effective way to improve the efficiency and deployability of medical VLC approaches, making AI-based healthcare solutions more accessible, especially in resource-constrained environments.