CVMay 6

Deep Reprogramming Distillation for Medical Foundation Models

Siyuan Du, Yuhang Zhou, Haolin Li, Jiangchao Yao, Haishuai Wang, Hui Lin, Ya Zhang, Yanfeng Wang

arXiv:2605.0444758.3

AI Analysis

For medical AI practitioners, DRD provides a method to efficiently adapt large foundation models to diverse downstream tasks with improved performance and reduced computational cost.

The paper proposes Deep Reprogramming Distillation (DRD) to adapt medical foundation models to specific downstream tasks, overcoming domain gaps and enabling efficient knowledge transfer to lightweight models. DRD outperforms previous PEFT and KD methods across 18 medical tasks in 2D/3D classification and segmentation.

Medical foundation models pre-trained on large-scale datasets have shown powerful versatile performance. However, when adapting medical foundation models for specific medical scenarios, it remains the inevitable challenge due to the gap induced by the discrepancy between pre-training and downstream tasks, the real-world computation, and speed constraints. Relevant techniques that probably handle this challenge more or less suffer from some intrinsic limitations. For example, knowledge distillation (KD) assumes that teacher and student models share the same task, training strategy, and model structure family, while prevalent parameter-efficient fine-tuning (PEFT) fails to achieve personalized and lightweight deployment. Even the combination of PEFT and KD still struggles to resolve model structures and training strategies inconsistencies between teacher and student models, leading to inefficient knowledge transfer. In this study, we propose a novel framework called Deep Reprogramming Distillation (DRD) to combat the general adaptation challenge. Specifically, DRD introduces the novel reprogramming module that on the one side overcomes the domain and task discrepancy between pretraining and downstream scenarios, and on the other side builds the student-friendly efficient distillation from foundation models to lightweight downstream models. Furthermore, to mitigate variability under different training conditions, we design a centered kernel alignment (CKA) distillation method to promote robust knowledge transfer. Empirical results show that DRD surpasses previous PEFT and KD methods across 18 medical downstream tasks under different foundation models, covering various scenarios including 2D/3D classification and 2D/3D segmentation.

View on arXiv PDF

Similar