CVMar 1

Fake It Right: Injecting Anatomical Logic into Synthetic Supervised Pre-training for Medical Segmentation

arXiv:2603.00979v1h-index: 13
Originality Incremental advance
AI Analysis

This provides a data-efficient, privacy-compliant solution for medical segmentation, though it is incremental over existing FDSL approaches.

The paper tackles the problem of data scarcity and privacy concerns in 3D medical segmentation by proposing an anatomy-informed synthetic supervised pre-training framework that replaces generic shapes with anatomical priors, achieving performance gains of 1.74% over FDSL baselines and up to 1.66% over SSL methods on BTCV and MSD datasets.

Vision Transformers (ViTs) excel in 3D medical segmentation but require massive annotated datasets. While Self-Supervised Learning (SSL) mitigates this using unlabeled data, it still faces strict privacy and logistical barriers. Formula-Driven Supervised Learning (FDSL) offers a privacy-preserving alternative by pre-training on synthetic mathematical primitives. However, a critical semantic gap limits its efficacy: generic shapes lack the morphological fidelity, fixed spatial layouts, and inter-organ relationships of real anatomy, preventing models from learning essential global structural priors. To bridge this gap, we propose an Anatomy-Informed Synthetic Supervised Pre-training framework unifying FDSL's infinite scalability with anatomical realism. We replace basic primitives with a lightweight shape bank with de-identified, label-only segmentation masks from 5 subjects. Furthermore, we introduce a structure-aware sequential placement strategy to govern the patch synthesis process. Instead of random placement, we enforce physiological plausibility using spatial anchors for correct localization and a topological graph to manage inter-organ interactions (e.g., preventing impossible overlaps). Extensive experiments on BTCV and MSD datasets demonstrate that our method significantly outperforms state-of-the-art FDSL baselines and SSL methods by 1.74\% and up to 1.66\%, while exhibiting a robust scaling effect where performance improves with increased synthetic data volume. This provides a data-efficient, privacy-compliant solution for medical segmentation. The code will be made publicly available upon acceptance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes