IVCVFeb 28, 2025

Autoregressive Medical Image Segmentation via Next-Scale Mask Prediction

arXiv:2502.20784v13 citationsh-index: 6MICCAI
Originality Incremental advance
AI Analysis

This work addresses segmentation challenges in complex anatomical regions for medical imaging applications, representing an incremental improvement over existing multi-scale methods.

The paper tackles the problem of insufficient inter-scale dependencies in medical image segmentation by proposing AR-Seg, an autoregressive framework that progressively predicts next-scale masks, achieving state-of-the-art performance on two benchmark datasets.

While deep learning has significantly advanced medical image segmentation, most existing methods still struggle with handling complex anatomical regions. Cascaded or deep supervision-based approaches attempt to address this challenge through multi-scale feature learning but fail to establish sufficient inter-scale dependencies, as each scale relies solely on the features of the immediate predecessor. To this end, we propose the AutoRegressive Segmentation framework via next-scale mask prediction, termed AR-Seg, which progressively predicts the next-scale mask by explicitly modeling dependencies across all previous scales within a unified architecture. AR-Seg introduces three innovations: (1) a multi-scale mask autoencoder that quantizes the mask into multi-scale token maps to capture hierarchical anatomical structures, (2) a next-scale autoregressive mechanism that progressively predicts next-scale masks to enable sufficient inter-scale dependencies, and (3) a consensus-aggregation strategy that combines multiple sampled results to generate a more accurate mask, further improving segmentation robustness. Extensive experimental results on two benchmark datasets with different modalities demonstrate that AR-Seg outperforms state-of-the-art methods while explicitly visualizing the intermediate coarse-to-fine segmentation process.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes