SpectraFlow: Unifying Structural Pretraining and Frequency Adaptation for Medical Image Segmentation
This work addresses the challenge of medical image segmentation with limited annotations, offering improved generalization and boundary delineation for clinical applications.
SpectraFlow introduces a two-stage framework for medical image segmentation that combines structure-aware encoder pretraining via Mixed-Domain MeanFlow with a boundary-oriented decoder using frequency-adaptive convolutions, achieving consistent improvements over state-of-the-art methods on ISIC-2016, Kvasir-SEG, and GlaS, especially in low-data regimes.
Medical image segmentation remains challenging in low-data regimes, where scarce annotations often yield poor generalization and ambiguous boundaries with missing fine structures. Recent self-supervised pretraining has improved transferability, but it often exhibits a texture bias. In contrast, accurate segmentation is inherently geometry-aware and depends on both topological consistency and precise boundary preservation. To address this problem, we propose a two-stage framework that couples structure-aware encoder pretraining with boundary-oriented decoding. In Stage-1, we aim to learn structure-aware representations for downstream segmentation in low-data regimes. To this end, we propose Mixed-Domain MeanFlow Pretraining, which aligns images and binary masks in a shared latent space through latent transport regression, where masks act as conditional structural guidance rather than prediction targets, making the pretraining task-agnostic. To further improve training stability under scarce supervision, we incorporate a lightweight Dispersive Loss to prevent representation collapse. In Stage-2, we fine-tune the pretrained encoder with a lightweight decoder that combines Direct Attentional Fusion for adaptive cross-scale gating and Frequency-Directional Dynamic Convolution for high-frequency boundary refinement under appearance variation. Experiments on ISIC-2016, Kvasir-SEG, and GlaS demonstrate consistent gains over state-of-the-art methods, with improved robustness in low-data settings and sharper boundary delineation.