CVMay 14

SpectraFlow: Unifying Structural Pretraining and Frequency Adaptation for Medical Image Segmentation

Zhiquan Chen, Haitao Wang, Guowei Zou, Hejun Wu

arXiv:2605.1456618.6

AI Analysis

This work addresses the challenge of medical image segmentation with limited annotations, offering improved generalization and boundary delineation for clinical applications.

SpectraFlow introduces a two-stage framework for medical image segmentation that combines structure-aware encoder pretraining via Mixed-Domain MeanFlow with a boundary-oriented decoder using frequency-adaptive convolutions, achieving consistent improvements over state-of-the-art methods on ISIC-2016, Kvasir-SEG, and GlaS, especially in low-data regimes.

Medical image segmentation remains challenging in low-data regimes, where scarce annotations often yield poor generalization and ambiguous boundaries with missing fine structures. Recent self-supervised pretraining has improved transferability, but it often exhibits a texture bias. In contrast, accurate segmentation is inherently geometry-aware and depends on both topological consistency and precise boundary preservation. To address this problem, we propose a two-stage framework that couples structure-aware encoder pretraining with boundary-oriented decoding. In Stage-1, we aim to learn structure-aware representations for downstream segmentation in low-data regimes. To this end, we propose Mixed-Domain MeanFlow Pretraining, which aligns images and binary masks in a shared latent space through latent transport regression, where masks act as conditional structural guidance rather than prediction targets, making the pretraining task-agnostic. To further improve training stability under scarce supervision, we incorporate a lightweight Dispersive Loss to prevent representation collapse. In Stage-2, we fine-tune the pretrained encoder with a lightweight decoder that combines Direct Attentional Fusion for adaptive cross-scale gating and Frequency-Directional Dynamic Convolution for high-frequency boundary refinement under appearance variation. Experiments on ISIC-2016, Kvasir-SEG, and GlaS demonstrate consistent gains over state-of-the-art methods, with improved robustness in low-data settings and sharper boundary delineation.

View on arXiv PDF

Similar