Towards Robust In-Context Learning for Medical Image Segmentation via Data Synthesis
This work addresses the data bottleneck for robust In-Context Learning in medical image segmentation, offering a domain-specific solution that is incremental in improving data synthesis methods.
The paper tackles the data scarcity problem in In-Context Learning for medical image segmentation by proposing SynthICL, a data synthesis framework that uses domain randomization and anatomical priors to generate diverse and realistic data, resulting in performance gains of up to 63% in average Dice and improved generalization to unseen domains.
The rise of In-Context Learning (ICL) for universal medical image segmentation has introduced an unprecedented demand for large-scale, diverse datasets for training, exacerbating the long-standing problem of data scarcity. While data synthesis offers a promising solution, existing methods often fail to simultaneously achieve both high data diversity and a domain distribution suitable for medical data. To bridge this gap, we propose \textbf{SynthICL}, a novel data synthesis framework built upon domain randomization. SynthICL ensures realism by leveraging anatomical priors from real-world datasets, generates diverse anatomical structures to cover a broad data distribution, and explicitly models inter-subject variations to create data cohorts suitable for ICL. Extensive experiments on four held-out datasets validate our framework's effectiveness, showing that models trained with our data achieve performance gains of up to 63\% in average Dice and substantially enhanced generalization to unseen anatomical domains. Our work helps mitigate the data bottleneck for ICL-based segmentation, paving the way for robust models. Our code and the generated dataset are publicly available at https://github.com/jiesihu/Neuroverse3D.