CVMay 14

Beyond Instance-Level Self-Supervision in 3D Multi-Modal Medical Imaging

arXiv:2605.1465435.4
AI Analysis

For medical imaging practitioners, this method improves multi-modal representation learning by exploiting consistent spatial relationships across individuals, offering moderate gains over instance-level self-supervision.

The paper introduces a self-supervised pre-training method for 3D multi-modal medical imaging that leverages cross-instance topological consistency of anatomical structures, achieving average improvements of 1.1% in segmentation and 5.94% in classification across 7 downstream tasks, with improved robustness to missing modalities.

Self-supervised pre-training methods in medical imaging typically treat each individual as an isolated instance, learning representations through augmentation-based objectives or masked reconstruction. They often do not adequately capitalize on a key characteristic of physiological features: anatomical structures maintain consistent spatial relationships across individuals (instances), such as the thalamus being medial to the basal ganglia, regardless of variations in brain size, shape, or pathology. We propose leveraging this cross-instance topological consistency as a supervisory signal. The challenge arises from the inherent variability in medical imaging, which can differ significantly across instances and modalities. To tackle this, we focus on two alignment regimes. (i) Intra-instance: with pixel-level correspondences available, a cross-modal triplet objective explicitly preserves local neighborhood topology. (ii) Inter-instance: without such supervision, we derive pseudo-correspondences to control partial neighborhood alignment and prevent topology collapse across modalities. We validate our approach across 7 downstream multi-modal tasks, achieving average improvements of 1.1% and 5.94% in segmentation and classification tasks, respectively, and demonstrating significantly better robustness when modalities are missing at test time.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes