CVMar 1

Fake It Right: Injecting Anatomical Logic into Synthetic Supervised Pre-training for Medical Segmentation

Jiaqi Tang, Mengyan Zheng, Shu Zhang, Fandong Zhang, Qingchao Chen

arXiv:2603.00979v11.5h-index: 14

Originality Incremental advance

AI Analysis

This provides a data-efficient, privacy-compliant solution for medical segmentation, though it is incremental over existing FDSL approaches.

The paper tackles the problem of data scarcity and privacy concerns in 3D medical segmentation by proposing an anatomy-informed synthetic supervised pre-training framework that replaces generic shapes with anatomical priors, achieving performance gains of 1.74% over FDSL baselines and up to 1.66% over SSL methods on BTCV and MSD datasets.

Vision Transformers (ViTs) excel in 3D medical segmentation but require massive annotated datasets. While Self-Supervised Learning (SSL) mitigates this using unlabeled data, it still faces strict privacy and logistical barriers. Formula-Driven Supervised Learning (FDSL) offers a privacy-preserving alternative by pre-training on synthetic mathematical primitives. However, a critical semantic gap limits its efficacy: generic shapes lack the morphological fidelity, fixed spatial layouts, and inter-organ relationships of real anatomy, preventing models from learning essential global structural priors. To bridge this gap, we propose an Anatomy-Informed Synthetic Supervised Pre-training framework unifying FDSL's infinite scalability with anatomical realism. We replace basic primitives with a lightweight shape bank with de-identified, label-only segmentation masks from 5 subjects. Furthermore, we introduce a structure-aware sequential placement strategy to govern the patch synthesis process. Instead of random placement, we enforce physiological plausibility using spatial anchors for correct localization and a topological graph to manage inter-organ interactions (e.g., preventing impossible overlaps). Extensive experiments on BTCV and MSD datasets demonstrate that our method significantly outperforms state-of-the-art FDSL baselines and SSL methods by 1.74\% and up to 1.66\%, while exhibiting a robust scaling effect where performance improves with increased synthetic data volume. This provides a data-efficient, privacy-compliant solution for medical segmentation. The code will be made publicly available upon acceptance.

View on arXiv PDF

Similar