CVFeb 23

HD-TTA: Hypothesis-Driven Test-Time Adaptation for Safer Brain Tumor Segmentation

arXiv:2602.19454v11.5h-index: 47

Originality Highly original

AI Analysis

This work addresses safety-critical issues in medical segmentation for clinical deployment, offering a novel approach to mitigate risks like tumor spillage into healthy tissue.

The paper tackles the problem of unsafe tumor mask predictions in brain tumor segmentation during test-time adaptation by proposing a hypothesis-driven framework that generates competing geometric hypotheses and selects the safest outcome, resulting in a reduction of Hausdorff Distance by approximately 6.4 mm and an improvement in Precision by over 4%.

Standard Test-Time Adaptation (TTA) methods typically treat inference as a blind optimization task, applying generic objectives to all or filtered test samples. In safety-critical medical segmentation, this lack of selectivity often causes the tumor mask to spill into healthy brain tissue or degrades predictions that were already correct. We propose Hypothesis-Driven TTA, a novel framework that reformulates adaptation as a dynamic decision process. Rather than forcing a single optimization trajectory, our method generates intuitive competing geometric hypotheses: compaction (is the prediction noisy? trim artifacts) versus inflation (is the valid tumor under-segmented? safely inflate to recover). It then employs a representation-guided selector to autonomously identify the safest outcome based on intrinsic texture consistency. Additionally, a pre-screening Gatekeeper prevents negative transfer by skipping adaptation on confident cases. We validate this proof-of-concept on a cross-domain binary brain tumor segmentation task, applying a source model trained on adult BraTS gliomas to unseen pediatric and more challenging meningioma target domains. HD-TTA improves safety-oriented outcomes (Hausdorff Distance (HD95) and Precision) over several state-of-the-art representative baselines in the challenging safety regime, reducing the HD95 by approximately 6.4 mm and improving Precision by over 4%, while maintaining comparable Dice scores. These results demonstrate that resolving the safety-adaptation trade-off via explicit hypothesis selection is a viable, robust path for safe clinical model deployment. Code will be made publicly available upon acceptance.

View on arXiv PDF

Similar