Revisiting 2D Foundation Models for Scalable 3D Medical Image Classification
This addresses the need for efficient and versatile classification in clinical workflows, offering a scalable solution that eliminates task-specific models, though it builds incrementally on existing foundation model adaptation techniques.
The paper tackled the problem of scalable 3D medical image classification by introducing AnyMC3D, a method adapted from 2D foundation models, which achieved state-of-the-art performance across 12 diverse tasks, including first place in the VLM3D challenge, using lightweight plugins of about 1M parameters per task.
3D medical image classification is essential for modern clinical workflows. Medical foundation models (FMs) have emerged as a promising approach for scaling to new tasks, yet current research suffers from three critical pitfalls: data-regime bias, suboptimal adaptation, and insufficient task coverage. In this paper, we address these pitfalls and introduce AnyMC3D, a scalable 3D classifier adapted from 2D FMs. Our method scales efficiently to new tasks by adding only lightweight plugins (about 1M parameters per task) on top of a single frozen backbone. This versatile framework also supports multi-view inputs, auxiliary pixel-level supervision, and interpretable heatmap generation. We establish a comprehensive benchmark of 12 tasks covering diverse pathologies, anatomies, and modalities, and systematically analyze state-of-the-art 3D classification techniques. Our analysis reveals key insights: (1) effective adaptation is essential to unlock FM potential, (2) general-purpose FMs can match medical-specific FMs if properly adapted, and (3) 2D-based methods surpass 3D architectures for 3D classification. For the first time, we demonstrate the feasibility of achieving state-of-the-art performance across diverse applications using a single scalable framework (including 1st place in the VLM3D challenge), eliminating the need for separate task-specific models.