CVAIMar 31, 2025

WaveFormer: A 3D Transformer with Wavelet-Driven Feature Representation for Efficient Medical Image Segmentation

arXiv:2503.23764v24 citationsh-index: 7MICCAI
Originality Incremental advance
AI Analysis

This addresses efficiency challenges in medical image segmentation for real-world deployment, though it is incremental as it builds on existing transformer architectures with wavelet-based enhancements.

The paper tackles the problem of high memory overhead and insufficient local feature capture in 3D transformer-based medical image segmentation by introducing WaveFormer, which uses discrete wavelet transformations to preserve global context and high-frequency details, achieving performance on par with state-of-the-art methods on datasets like BraTS2023, FLARE2021, and KiTS2023 while significantly reducing parameters and computational complexity.

Transformer-based architectures have advanced medical image analysis by effectively modeling long-range dependencies, yet they often struggle in 3D settings due to substantial memory overhead and insufficient capture of fine-grained local features. We address these limitations with WaveFormer, a novel 3D-transformer that: i) leverages the fundamental frequency-domain properties of features for contextual representation, and ii) is inspired by the top-down mechanism of the human visual recognition system, making it a biologically motivated architecture. By employing discrete wavelet transformations (DWT) at multiple scales, WaveFormer preserves both global context and high-frequency details while replacing heavy upsampling layers with efficient wavelet-based summarization and reconstruction. This significantly reduces the number of parameters, which is critical for real-world deployment where computational resources and training times are constrained. Furthermore, the model is generic and easily adaptable to diverse applications. Evaluations on BraTS2023, FLARE2021, and KiTS2023 demonstrate performance on par with state-of-the-art methods while offering substantially lower computational complexity.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes