Vector Contrastive Learning For Pixel-Wise Pretraining In Medical Vision
This work addresses a crucial bottleneck in developing generalizable medical visual foundation models, though it appears incremental as it builds on existing contrastive learning paradigms.
The paper tackled the problem of extending contrastive learning to pixel-wise representation for medical vision by addressing the over-dispersion issue in standard binary contrastive learning, and the result was that their COVER framework significantly improved pixel-wise self-supervised pretraining across 8 tasks, 2 dimensions, and 4 modalities.
Contrastive learning (CL) has become a cornerstone of self-supervised pretraining (SSP) in foundation models, however, extending CL to pixel-wise representation, crucial for medical vision, remains an open problem. Standard CL formulates SSP as a binary optimization problem (binary CL) where the excessive pursuit of feature dispersion leads to an over-dispersion problem, breaking pixel-wise feature correlation thus disrupting the intra-class distribution. Our vector CL reformulates CL as a vector regression problem, enabling dispersion quantification in pixel-wise pretraining via modeling feature distances in regressing displacement vectors. To implement this novel paradigm, we propose the COntrast in VEctor Regression (COVER) framework. COVER establishes an extendable vector-based self-learning, enforces a consistent optimization flow from vector regression to distance modeling, and leverages a vector pyramid architecture for granularity adaptation, thus preserving pixel-wise feature correlations in SSP. Extensive experiments across 8 tasks, spanning 2 dimensions and 4 modalities, show that COVER significantly improves pixel-wise SSP, advancing generalizable medical visual foundation models.